#174987 - Kensai - Fri Aug 13, 2010 8:27 pm
Hi,
I'm looking for a "reverse" version of memcpy32(), which could be used to flip an image horizontally.
#174988 - Dwedit - Fri Aug 13, 2010 9:14 pm
That wouldn't flip an image though, since pixels are 16-bit, not 32-bit.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."
#174989 - Kensai - Fri Aug 13, 2010 9:23 pm
Yes, the function would have to flip the 4 bytes (I need it for mode 4).
#174990 - Dwedit - Fri Aug 13, 2010 9:55 pm
Just use DMA, with Source and Destination address going in reverse directions.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."
#174991 - Kensai - Fri Aug 13, 2010 11:13 pm
Thank you for the advice. But how do I flip the bytes? You can't set "Chunk Size" to 8 bit. The only option is DMA_16 and it only flips the halfwords:
[1][2][3][4] -> [3][4][1][2]
What I need is:
[1][2][3][4] -> [4][3][2][1]
#174992 - Dwedit - Sat Aug 14, 2010 1:13 am
Sorry, forgot that was an 8-bit mode, thought you were using 16 bit pixels...
Off the top of my head, untested, check it for bugs...
Code: |
void copybackwards(u16 *src, u16* dest, int size) //size is in halfwords
{
src += size-1;
while (size > 0)
{
int a = *src--;
a = (a >> 8) | ((a & 0xFF)<<8);
*dest++ = a;
size--;
}
}
|
post edited, forgot a left shift
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."
#174998 - Kensai - Sat Aug 14, 2010 5:59 pm
Thank you. Do you think the code can be made faster using ASM?
#174999 - Dwedit - Sat Aug 14, 2010 9:13 pm
There's the 32-bit algorithm to swap bytes within a word:
Code: |
M1 = 00FF00FF
M2 = 0000FFFF
A = src[xxxx]
B = M1 & (A >> 8) //00FF00FF
C = A & M1
A = B | (C << 8)
B = M2 & (A >> 16) //0000FFFF
C = (A & M2)
A = B | (C << 16)
dest[xxxx] = A
|
And the 16 bit algorithm:
Code: |
A = src[xxxx]
B = (A << 8)
A = (A >> 8) | B
dest[xxxx] = A
|
Let's assume source is the cartridge and dest is VRAM, and this is a GBA.
32 bit takes 20 cycles to copy and swap 4 bytes. (4 more when you make it loop)
16 bit takes 11 cycles to copy and swap 2 bytes. (4 more when you make it loop)
Note, I never did get the hang of timing ARM instructions correctly, with the waitstates and all that, might be mistakes in the timing.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."
#175009 - Ruben - Sun Aug 15, 2010 12:37 pm
Building on Dwedit's post...
Code: |
@ Reverse memcpy.
@ Stores 16-bits
@ not very optimized, but
@ gets the job done AFAIK.
@ Cost is roughly 3 + 8x
@ where x is the number of
@ bytes to be copied.
@ ---
@ r0: dst
@ r1: src
@ r2: cnt [in bytes]
memcpyr16:
subs r2, #2 @ 1 ( 1)
ldrcsh r3, [r1, r2] @ ~6 ( 7)
movcs ip, r3, lsr #8 @ 1 ( 8)
orrcs r3, ip, r3, lsl #8 @ 1 ( 9)
strcsh r3, [r0], #2 @ ~4 (13)
bne memcpyr16 @ 3 (16)
bx lr @ 3 (19) |
EDIT: Fixed formatting and added timing + typo
#175020 - Miked0801 - Mon Aug 16, 2010 9:37 pm
Any chance this image is a sprite and you could use the X-Flip bit? Or perhaps a tile based BG in which you could also use the same bit? If it's a 3D layer or something that allows scaling, you could negative X scale to get this effect as well.
#175027 - Cearn - Wed Aug 18, 2010 8:36 pm
32bit-based version using some ROR <3.
Code: |
/*!
@function void memrcpy32(const void *src, void *dst, uint size);
Byte-reverse copies \a size/4 words from \a src to \a dst.
@param src Source pointer.
@param dst Destination pointer. Points to the START of the buffer.
@param size number of bytes to copy.
@note Kinda expects word-alignment for everything (for now).
*/
.section .iwram, "ax", %progbits
.arm
.align
.global memrcpy32
memrcpy32:
bics r2, r2, #3 @ word-align size,
bxeq lr @ and perhaps quick escape.
stmfd sp!, {r4}
add r1, r1, r2 @ point dst to its (not it's) tail.
ldr ip,=0x00FF00FF
.LrcpyLoop:
ldmia r0!, {r3} @ r3: abcd ; r3= *src++;
and r4, ip, r3, ror #16 @ r4: 0d0b
and r3, ip, r3, ror #24 @ r3: 0c0a
orr r3, r3, r4, lsl #8 @ r3: dcba
stmdb r1!, {r3} @ ; *--dst= r3;
subs r2, r2, #4
bne .LrcpyLoop
ldmfd sp!, {r4}
bx lr
|
Of course, this only works if everything is word aligned. If not, you'll have to account for the misalignments, as well as deal with the head and tail ending in the middle of a word.
If possible, use sprites for image flipping.
#175030 - Ruben - Thu Aug 19, 2010 9:10 am
Can't you avoid the "add r1, r1, r2" by instead chaging the loop to...
Code: |
.LrcpyLoop:
subs r2, r2, #4
ldrcs r3, [r0, r2] @ r3: abcd ; r3= src[end--]
andcs r4, ip, r3, ror #16 @ r4: 0d0b
andcs r3, ip, r3, ror #24 @ r3: 0c0a
orrcs r3, r3, r4, lsl #8 @ r3: dcba
stmcsia r1!, {r3} @ ; *dst++ = r3;
bhi .LrcpyLoop |
Or does that screw up the sequential access timing?