Thoughts on CPU optimization » Mutual Funds » Money Forum

Money Forum

Money Forum



ПоискПоиск   Users   Registration   Entrance
Today: 03.04.2025 - 10:14:42
Pages:  1  2  

Thoughts on CPU optimization

Advertising


MessageAuthor

Woohoo! Hardware! Seriously, anything that makes me feel like I'm writing code for a c64 is, to me, a good thing. Richard, check this out. The guy goes through all sorts of tests and, in the end, determines "running memcpy for large transfers is probably a good thing." I love doing assembly tweaks and such. bitblt was one of the first things I looked at, but I really don't think there's much left to do there, except just to avoid calling it as much as one can.

---------------------

Dgemma

money-user




Statistics:
Messages: 2
Registration: 29.09.2010
28.05.22 - 00:37:06
Message # 1
RE: Thoughts on CPU optimization

Real-world testing 

---------------------

philon_68

money-user




Statistics:
Messages: 5
Registration: 08.09.2010
28.05.22 - 00:45:06
Message # 2
RE: Thoughts on CPU optimization

The code: Code:

---------------------

zinum

money-user




Statistics:
Messages: 71
Registration: 10.06.2008
28.05.22 - 00:50:22
Message # 3
RE: Thoughts on CPU optimization

Ouch! Are you sure about your results? Sorry to ask, but it seems to good to be true. Did you time the script with only AVIsourve and ConvertToRGB32 ?

---------------------
http://tinyurl.com/gddessotovik

akdaviaa

money-user




Statistics:
Messages: 8
Registration: 27.09.2009
28.05.22 - 01:01:42
Message # 4
RE: Thoughts on CPU optimization

The script was exactly the same in both cases. Only difference was I commented out a different line & recompiled. After I decided it might be fast enough to be pushing network bandwidth, I tested with just AVISource - 0:11. Adding ConvertToRGB32 upon your suggestion - 0:14. Back-of-the-envelope calculations: 640 x 480 x 4 x 300 x 20 / 230 = 32 MB/s with memcpy Achieving 10X that should be very possible; my CPU's throughput with 100% cache hits should only be limited by memory bandwidth, which for DDR is 2.1GB/s.

---------------------

Narayana

money-user




Statistics:
Messages: 3
Registration: 17.01.2010
28.05.22 - 01:05:29
Message # 5
RE: Thoughts on CPU optimization

I thought the M$ VS6 compiler would already use a fast machine specific memcpy() if you let it. But I don't remember how you would specify this. - Tom

---------------------

Le Sya

money-user




Statistics:
Messages: 271
Registration: 07.08.2009
28.05.22 - 01:14:32
Message # 6
RE: Thoughts on CPU optimization

I've run some tests and my results differs from yours. I've made a filter which calls BitBlt 10 times (with same source and destination), and I used BlankClip as source. I get 640 x 480 x 4 x 2000 x 10 / 73 = 320 MB/s with memcpy() and about 500 MB/s using a simple MOVQ/MOVNTQ pair. (DDR2100)

---------------------

Re_Volt

money-user




Statistics:
Messages: 8
Registration: 14.03.2008
28.05.22 - 01:20:19
Message # 7
RE: Thoughts on CPU optimization

Personally I'm using a method of doing defines around different parts of the code. I'm doing it by haveing the assembler code in a seperate file, that is included several times, but with different defines (and different method names depending on the defines). So far I haven't had any problems using this method. I'm currently looking at places in Avisynth for important speedups - depending on how often they are used by ordinary people. I seem to remember that I found that either ConverttoYUY2 or ConverttoRGB() is complely unoptimized, and since they are very often used, it is quite obvious. Also an optimized naiive spatial blur (that blurs no matter what, like the built-in Blur()) could be quite useful. Any thoughs on other important Avisynth hot-spots that could be nice to have optimized? But since I still have the avisynth smooth hiq and have plans for a deflicker filter for 50fps video for vdub it may take a while. <sigh> ;)

---------------------

Pantera25

money-user




Statistics:
Messages: 384
Registration: 05.06.2009
28.05.22 - 01:28:29
Message # 8
RE: Thoughts on CPU optimization

I'm only doing the include trick in smooth hiq, since merge is too simple for that. I decided not to do an athlon specific function - mostly out of laziness - but the prefecth stuff is there - just #define ATHLON. I was thinking of ConvertToYUY2 - in the version I've got, there doesn't seem to be any optimizations, RGB. It includes a lot of muls, so using properly placed pmaddwd would should be able to speed up this code somewhat - and memory access should also be reduced greatly by using the mmx-registers properly. The fractions should however be reduced to 15 bits instead of 16 bits, to avoid 16bit signed overflows. Regarding prefetch, it can be quite tricky - using mov can be a bad thing, since it stalls until the read is completed. Prefetch does not stall, if the data is not yet retrieved. AMD has a prefetch routine in their AMD optimization guide, that (according to their claims) should be the best available. Also I'm quite reluctant to use movntq, since I almost always have to read again from the same cache line (the next pixel). So if I'm using movntq I get a heavy read penalty, since the data has to be fetched from memory every time. Is this a stupid assumption?

---------------------

svetlan4ik

money-user




Statistics:
Messages: 8
Registration: 22.08.2009
28.05.22 - 01:38:59
Message # 9
RE: Thoughts on CPU optimization

I can bring you an in-progress version of the filter so you can see how it's done - it's just an experiment, but it seems to work. Here's how: Create another file (assembler_templates.cpp for instance), include it in your project, but make VC++ exclude it from the build. Instead of adding the methods in your filter in your filter source use: Code:

---------------------

extor

money-user




Statistics:
Messages: 3
Registration: 20 paź 2007, 21:28
28.05.22 - 01:43:35
Message # 10
RE: Thoughts on CPU optimization
1850 - May 4 - A Prolific Cow : Previous topicNext topic: Question for those who have "Allow trim of MIDI items when splitting" pref turned off
Pages:  1  2  

The administrator has prohibited guests from replying to messages! To register, follow the link: register


Participants