gbadev.org forum archive

I have the following code that calculates an IP checksum, as in the checksum found in an IP header. I have included the code. Can it be optimzed any better?

Thanks,
AH

b MainFunction

UnitTest1:
@DCW 0x0100, 0xF203, 0xF4F5, 0xF6F7
@DCW 0x0000, 0x0000, 0x0000, 0x0000 ;; ChkSum = 0x210E

UnitTest2:
@DCW 0x45C0, 0x0044, 0xad22, 0x0159
@DCW 0xAC14, 0x7143, 0xE000, 0x0005 ;; ChkSum = 0x0E22

UnitTest3:
@DCW 0x45C0, 0x0044, 0xDC15, 0x0159
@DCW 0xAC14, 0x7142, 0xE000, 0x0005 ;; ChkSum = 0xDF2F

MainFunction:

ldr R1,=UnitTest1 ;; This is the ADDRESS of my UnitTest Data
ldr R2,=8 ;; Size of the array of 16 bit words

mov R3,#0 ;; clear this guy out
ldr R4,=0xFFFF ;; MASK for lower 16 bits

lbl_AddTheData1:

ldr R5,[R1]+2!, ;; get value @ R1, put into R4, inc R1 by 2 - moves to next word
and R5, R5, R4 ;; only want lower 16 bits
add R3, R3, R5 ;; sum these values

subs R2, R2, 1 ;; dec by 1
bne lbl_AddTheData1 ;; Repeat until we hit the end of the array

lbl_AdditionComplete:

;;mov R2, R3, LSR #16 ;; The upper 16-bits are the carries
;;add R3, R3, R2 ;; Add the carries to the sum in R3
add R3, R3, R3 LSR #16 ;; The above two lines in one command

EOR R3, R3, R4 ;; NOT R3
and R3, R3, R4 ;; Only want the lower 16-bits

EndOfProgram:

B EndOfProgram
_________________
urpNO_SPAM_MAN@canerdian.ca
http://www.canerdian.ca

use LDRH for reading 16bit instead of 32bit.
_________________
I probably suck, my not is a programmer.

Here is what I have come up with. I tried it with the first data set (on paper), and it yeilds the correct results.

Code:

MainFunction: ; Optimized for speed and size
; User init.
ldr r1, =UnitTest1 ; Load our sample data
ldr r2, =8 ; The array size is 8

; Program init.
mov r3, #0 ; Reset our sub counter
mov r2, r2, lsl #2 ; Multiply the array size by two (get bytesize)

; Sum loop
loop:
subs r2, r2, #2 ; Go back two bytes
ldrh r5, [r1, r2] ; Load the data
sub r3, r3, r5 ; Subtract ("add") our data
bne loop ; Loop if not the first byte

; r3 now contains the negative sum of the data

add r3, r3, r3, lsl #16 ; Add the upper and lower half-words, storing it in the upper half-word
mov r3, r3, lsr #16 ; Shift the result back into place.

My major itching was that your code used masks. I didn't like this, so I used some fancy trickery to let the last two instructions do the masking for me. Also, this bit of code assumes that you are optimizing for both speed and size; if you want more speed, you can use 32-bit reads instead.

Edit: Fixed some typos.

Last edited by strager on Wed Feb 22, 2006 10:56 pm; edited 1 time in total

Hi and thanks for the suggestions. Strager, in your code, I cannot see where the actual data (in R5) is being used to generate the checksum? Also when I ran your code, using UnitTest3, I do not get the correct checksum.

Based on some experimenting and FluBBa's suggestion, I have came up with the following code.

MainFunction:

ldr R1,=UnitTest2 ;; 1st Argument, address of array of 16-bit values

ldr R2,=8 ;; 2nd Argument, length of array

mov R3,#0 ;; zero out - will hold the SUM of 16 bit array

ldr R4,=0xFFFF ;; MASK for lower 16 bits - used frequently

lbl_SummingTheArray:

ldrh R5,[R1]+2! ;; load the 16-bit value @ R1, inc R1 pointer by 2

add R3, R3, R5 ;; sum these values

subs R2, R2, 1 ;; dec by 1

bne lbl_SummingTheArray ;; Repeat until the end of the array

FinalStepIsAddingTheCarriesAndInverting:

;;mov R2, R3, LSR #16 ;; The upper 16-bits are the carries
;;add R3, R3, R2 ;; Add the carries to the sum in R3

add R3, R3, R3 LSR #16 ;; The above two lines in one command

;;EOR R3, R3, R4 ;; NOT R3
;;and R3, R3, R4 ;; Only want the lower 16-bits

;; The above two lines can be done with BIC
BIC R3, R4, R3 ;; NOT R3, then AND with R4, result in R3

EndOfProgram:
B EndOfProgram

aaronphughes wrote:

Hi and thanks for the suggestions. Strager, in your code, I cannot see where the actual data (in R5) is being used to generate the checksum? Also when I ran your code, using UnitTest3, I do not get the correct checksum.

strager's procedure is sound, only the 'r2' in the sub before the loop-branch should have been an 'r5'. Change that and it's fixed. This would be my own version, in GNU asm:

Code:

@ code tags are good for displaying code!

@ === u16 ipcrc(const void *data) CODE_IN_IWRAM; ======================
@ NOTE! GNU asm syntax.
@ reglist: r0-r3: data pointer, value, counter and sum

.section .iwram,"ax", %progbits
.align 2
.code 32
.global ipcrc
ipcrc:
mov r2, #8
mov r3, #0

@ Summing data; using -x instead of +x already inverts the sum
.Lcrc_loop:
   ldrh r1,[r0],#2
   sub r3, r3, r1
   subs r2, r2, #1
   bne .Lcrc_loop

@ add sum (lower halfword) to the carries (upper hw)
@ then shift sum+carries down and into the return register
add r3, r3, r3, lsl #16
mov r0, r3, lsr #16
bx lr

strager wrote:

... if you want more speed, you can use 32-bit writes instead.

You meant 'reads', right? Not sure that'd help, though: yes, the loads are quicker, but simply adding words will screw up the carries (at least that's what happened when I tested it) and masking to compensate defeats the purpose of loading words in the first place.

Another optimization would be to unroll the loop, but that's cheating. :P

I've worked it out to be a little faster without loop unrolling. This one reads two halfwords at a time, then adds those two together with the counter. Then, it traps the overflow and adds it to the counter if necessary.

Code:

.arm
.global ipsum

; u16 ipsum(u16 *data); - Finds the IPSUM of data
; Notes - If the halfword size is not even (x%2==1),
; the function hangs. No way to get around this unless
; you want to waste an extra four bytes and a few cycles.
; Current function cycles: 5 + size / 2 * (5 or 6)
; Old function cycles: 5 + size * 4

ipsum: ; Finds the sum of an IP address (?)
mov r2, #8 ; Halfword size
mov r3, #0 ; Counter

ipsum_loop: ; Main loop
ldr r1, [r0], #4 ; Load 2 halfwords
adds r1, r1, r1, lsl #16 ; Add them
sub r3, r3, r1, lsr #16 ; Sub from counter
subvs r3, r3, #0x00010000 ; Sub overflow
subs r2, r2, #2 ; Sub size
bne ipsum_loop ; Loop if size > 0

add r3, r3, r3, lsl #16 ; Add the two halfwords
mov r0, r3, lsr #16 ; Shift back

bx lr ; Return

I tried working it out on paper, and I got the same results (with only four iterations). Just be aware that the data must be 4-byte (perhaps 2-byte?) aligned, or the code may trip on itself. It only takes up 8 more bytes of space, so if size isn't what you are optimizing for, this may as well be the best you can do without loop unrolling.

Should the extra sub be on carry set? Or am I missing something?
_________________
I probably suck, my not is a programmer.

FluBBa wrote:

Should the extra sub be on carry set? Or am I missing something?

I'm not so sure of that. I assumed that "carry" meant that some bits of the number were modified because the bits left one on both sides were both one. E.g., 01 + 01 = 10, because both bit zeros are 1. Overflow means that the operations caused the number to go past the 31'nd bit, I think. Perhaps you were thinking of the logical operators, where when a bit is shifted off the register, the carry flag is set?

Maybe I am way off; I don't know, I haven't tried it.

Overflow means that a number goes from it's max signed value over to it's lowest signed value (or the other way around as well).

like this: 0x7F800000+0x00821220=0x80021220
Overflow will be set, the value goes from a positive to a negative by adding 2 positive values.
I don't have any good site which explains it better byt I suppose Google could help you.

Carry is used when the value wraps from its max unsigned value.
_________________
I probably suck, my not is a programmer.

gbadev.org forum archive

ASM > Can this be optimized better?

#72268 - aaronphughes - Fri Feb 17, 2006 8:04 am

#72277 - FluBBa - Fri Feb 17, 2006 9:18 am

#72821 - strager - Tue Feb 21, 2006 11:57 pm

#72959 - aaronphughes - Wed Feb 22, 2006 5:14 pm

#72977 - Cearn - Wed Feb 22, 2006 7:31 pm

#73223 - strager - Thu Feb 23, 2006 11:41 pm

#73611 - FluBBa - Sun Feb 26, 2006 10:16 pm

#73620 - strager - Sun Feb 26, 2006 11:42 pm

#74324 - FluBBa - Sat Mar 04, 2006 12:41 pm