How AMD treats driver bug reports
category: general [glöplog]
Sorry for the non-german people, please do use a translation website if you're curious.
My first inquiry:
Their reply:
Then my detailed bug-report:
And, after 4 weeks of silence:
Support ticket closed.
OK that last reply is the interesting part, I've translated it for everyone:
I tried to answer them, but it was pointless, support ticket closed. How the hell can you pretend you tested a program (demo), that's only 4-5 minutes long (and not 10min), not recognize the usual driver crash&reset, and then pretend that there are no bugs in the Catalyst driver if the program couldn't start? And on top of that, telling me that my GPU is fine, while I told them precisely that I tested it on different PCs with the same effect? Now that's what I call fully tested.
  
My first inquiry:
Quote:
> Hi, I've coded a graphics demo, and it's been crashing the catalyst
> driver ever since. The problem has not been fixed in v9.12. It's
> reproduce-able in every vista/win7 x86 and x64 system with AMD GPU's
> I've tested. Please be so kind to download the demohere:
> http://blu-flame.org/files/bf-timeless.zip It should crash right after
> the loading progress bar reaches 100%. It's written in C++ and OpenGL.
> I'd kindly provide the source code, if needed.
Their reply:
Quote:
> Ich erlaube mir ihnen auf Deutsch zurückzuschreiben. Ich würde ihre
> Demo gerne an unsere Entwickler weitergeben damit diese die Tests
> ausführen können. Dafür wäre es aber notwendig dass sie uns einige
> Details über das Projekt geben und um was es sich genau handelt.
Then my detailed bug-report:
Quote:
Hallo,
Mein Projekt ist eine "realtime demo", ein Programm, dass in Echtzeit
Grafik und Sound wiedergibt, so ähnlich wie die, die Ihr auch auf Eurer
Website anbietet: http://ati.amd.com/developer/demos/r9700.html
Wo jetzt genau der Fehler liegt, habe ich nicht ermitteln können. Die
Symptome sind folgende: Nachdem der Ladebalken die 100% erreicht, bleibt
er dort stecken, und der Lüfter der Grafikkarte fängt an, schnell zu
drehen. Nach ca. 5-7 Sekunden stockt die Maus, dann wird der ganze
Bildschirm schwarz, die Grafikkarte resettet, und Windows meldet im
system-tray, dass der Treiber abgestürzt sei und wiederhergestellt wurde.
Durch durchsteppen mit dem Debugger, habe ich festgestellt, dass der
Grafiktreiberabsturz unmittelbar nach der Ausführung der Funktion
*glDrawElements* geschieht.
Ich bekomme keinerlei OpenGL Fehler (*glGetError* liefert mir zu jedem
Zeitpunkt *GL_NO_ERROR *zurück). Meine shader kompilieren auch einwandfrei.
Hier nochmal den Link zur Demo: http://blu-flame.org/files/bf-timeless.zip
Und wie gesagt, falls der Source-Code benötigt wird, gebe ich ihn auch
gerne frei.
And, after 4 weeks of silence:
Quote:
Wir haben Ihr Programm nun ausfuehrlich getestet und konnten die von Ihnen beschriebenen Fehler nicht feststellen. Die Maus stockte auch nach 10min nicht, es kam zu keinem Freeze oder Reboot, auch der Luefter der Grafikkarten drehte nicht hoch. Allerdings konnten wir Ihr Programm mit keiner Konfiguration zum korrekten laufen bringen. Es scheint also ein Fehler im Code vorzuliegen. Leider koennen wir keinen Support fuer Developer geben. Es scheint auf jeden Fall kein Problem mit Ihrer Grafikkarte vorzuliegen.
Support ticket closed.
OK that last reply is the interesting part, I've translated it for everyone:
Quote:
We have fully tested your program and could not find the error you described. The mouse did not stop even after 10min, there was no freeze or reboot, and the fan of the card did not turn up. However, we were't able to correctly run your program with any configuration. So it seems to be a bug in the code. Unfortunately, we can't give support for developers. In any case, there doesn't seem to be present any problem with your video card.
I tried to answer them, but it was pointless, support ticket closed. How the hell can you pretend you tested a program (demo), that's only 4-5 minutes long (and not 10min), not recognize the usual driver crash&reset, and then pretend that there are no bugs in the Catalyst driver if the program couldn't start? And on top of that, telling me that my GPU is fine, while I told them precisely that I tested it on different PCs with the same effect? Now that's what I call fully tested.
maybe they used newer hardware?
  
Oh, hardy&decipher, you think I'll be back soon? With a hatetro most likely.
  
For the record, it crashes on my HD 4650. the compo PCs at Function also had HD 4xxx GPUs.
  

hmmmmm
Seriously, why havent I seen this error message? Iq's laptop also had XP with nVidia and it ran perfectly.
  
notorius: Seems like the opengl functions weren't imported, if it's trying to execute a null pointer it can only be the result of wglGetProcAddress returnung NULL.
  
Did you explain that it works fine on other hardware?
  
or maybe kick em in the nuts by explaining to them that it works fine on "real" hardware.
  
If they replied in german when you first wrote your in english, you probably contacted your country's support (driver support is US based) and/or you got routed to the "wrong" support level, and I can easily guess why (your report is honestly a bad one).
IMO, their fault is not giving you the right info on how to submit a new, proper request, one who can be forwarded to the right support level.
To get more attention, I would:
- cut down the demo to a minimal source code which reproduces the error (something like 100 lines of code without any external library)
- send a new inquiry to the driver support in the US, including this new source code, with some examples of hardwares it crashes on, and it maybe some examples on which it runs smoothly (nVidia hw is ok in this case). For every tested hw include:
* CPU info
* GPU info (including manufacturer and PCB version, if possible)
* Driver version (including revisions)
* System version (including SPs and Hotfixes)
* Relevant system library versions
Hope it helps.
  
IMO, their fault is not giving you the right info on how to submit a new, proper request, one who can be forwarded to the right support level.
To get more attention, I would:
- cut down the demo to a minimal source code which reproduces the error (something like 100 lines of code without any external library)
- send a new inquiry to the driver support in the US, including this new source code, with some examples of hardwares it crashes on, and it maybe some examples on which it runs smoothly (nVidia hw is ok in this case). For every tested hw include:
* CPU info
* GPU info (including manufacturer and PCB version, if possible)
* Driver version (including revisions)
* System version (including SPs and Hotfixes)
* Relevant system library versions
Hope it helps.
I'm with pan that if you want any response the only way is by giving them a very very minimal that can reproduce the problem, otherwise they are not gonna do the effort to debug of course. if you give them something minimal and you give them the right info, I believe they (still) answer. 
Not sure your crash is due to shaders or something else, but I found that 5 shaders in Timeless are not legal GLSL so it will not run in ATI anyways. I would fix those before sending them anything again.
  
Not sure your crash is due to shaders or something else, but I found that 5 shaders in Timeless are not legal GLSL so it will not run in ATI anyways. I would fix those before sending them anything again.
when does it crash? during glCreateContext or glMakeCurrent? (don't remember which exactly, been some time). if so, i can help you.
also, which packer are you using? crinkler?
  
also, which packer are you using? crinkler?
if a QA department would set me up with even 150% of the info you supplied AMD with i'd either close or bounce the bug back to them too.
what iq said. supply specifics, these people generally are busy enough.
  
what iq said. supply specifics, these people generally are busy enough.
i bet that "fully tested" for 10 minutes, implies that they turned on your demo - watched some other monitors for some time and after a while your demo ended and the system was still intact and running perfectly. 
do as suggested, present a minimal example with source code. my bet is that the bug is within your code - even if you say that it used to work before upgrading the drivers. perhaps you relied on a flaw in the old drivers?
  
do as suggested, present a minimal example with source code. my bet is that the bug is within your code - even if you say that it used to work before upgrading the drivers. perhaps you relied on a flaw in the old drivers?
if you want it fixed you should make the effort to find the place where it crashes yourself - ask for help around here if necessary to find someone with the right config and locate the exact place where it crashes. 
if you can determine that it's failing with valid inputs, then you've got a case for their dev support.
  
if you can determine that it's failing with valid inputs, then you've got a case for their dev support.
you guys seem awfully leaning towards defending the company's side, no matter what they do. AMD's response really was a quite poor one.
  
(What smash said basically)
Well, the job of the programmer is to make the job of the support easier. You have the full source code, if you can't be bothered to spend the time to narrow down to the actual sequence of valid operations leading to the crash, why should they do it for you ?
At funcom when we find a crash in ATI drivers in Age of Conan, we don't send the 30 gigs client. We provide a small executable with the minimum number of code/shaders/assets required to show the issue.
You should also provide something if possible not packed, in non optimized build with full symbols if possible as well.
  
Well, the job of the programmer is to make the job of the support easier. You have the full source code, if you can't be bothered to spend the time to narrow down to the actual sequence of valid operations leading to the crash, why should they do it for you ?
At funcom when we find a crash in ATI drivers in Age of Conan, we don't send the 30 gigs client. We provide a small executable with the minimum number of code/shaders/assets required to show the issue.
You should also provide something if possible not packed, in non optimized build with full symbols if possible as well.
wait, what? you got a reply from AMD qa? =)
  
it shouldn't take more than few minutes to evaluate what is the exact function it crashes. and of course preferably running it under debug mode, evaluating if it crashes there, too. (if not, usually some uninitialized variables).
  
and btw, if you arent a registered AMD/ATI developer - good luck. :)
  
OK, I admit I could have made it a bit more easier for them. I'll try to make a small driver-crash example, I think that using a GL debugger (gDEBugger, glslDevil, glIntercept, ...) I'll just record all openGL commands issued until the crash and write an application that executes them linearly.
@ryg: Mal genauer durchlesen, und viellleicht downloaden? PC-Demo, 19MB groß. glDrawElements.
  
Quote:
when does it crash? during glCreateContext or glMakeCurrent? (don't remember which exactly, been some time). if so, i can help you.
also, which packer are you using? crinkler?
@ryg: Mal genauer durchlesen, und viellleicht downloaden? PC-Demo, 19MB groß. glDrawElements.
Quote:
you guys seem awfully leaning towards defending the company's side, no matter what they do. AMD's response really was a quite poor one.
Quite poor response for quite poor report. When doing support tickets you rarely have time to do any detective work to determine whether the bug report is total rubbish or whether bug reporter is a clueless fart or a code guru. More technical, straightforward and detailed your report is then it's more likely that it will be forwarded to more technical support level from 1st level support.
The support answer technique is a standard SI/SO type of answer (=Shit-In/Shit-Out) ;)
if this crash is what i think it is, it shouldn't occur with an uncompressed executable.
ran into something similar some years ago with kkrunchy, and it turned out that the ogl runtime was trying to read the import table of the running process (don't ask me why) and choking on it. don't know what the fix was, but some fiddling with the import table layout in kkrunchy solved it.
  
ran into something similar some years ago with kkrunchy, and it turned out that the ogl runtime was trying to read the import table of the running process (don't ask me why) and choking on it. don't know what the fix was, but some fiddling with the import table layout in kkrunchy solved it.
ah, i remember, we had the problem with some of the CNS 64ks as well - if i recall correctly you had to zero out something, OriginalFirstThunk maybe?
  
...or it was zero and you had to fill in some bogus value? can't recall, i just know you mentioned it at the time...
  

















