I am already working 16 hours a day, what I need is someone that already knows this off the top of their head. Feasible / Infeasible ?
I am already working 16 hours a day, what I need is someone that already knows this off the top of their head. Feasible / Infeasible ?
I already have MS Windows completely handled, the answer there is mouse_event(), keybd_event() and SendInput(). What I am looking for is a solution for Mac OS, and Linux. Oh Yeah I forgot, there are (or used to be) a few apps the took their input directly from the hardware, so I might not have every single MS Windows app. One MS DOS app that did this was PC Anywhere.
Good, what about sending keystrokes and mouse actions to Linux?
One of the applications that I am creating with my
Not sure if you are asking this question rhetorically, but yes you can, using XWD or, programmatically, using XGetImage on the root window.
The question is whether you intend to track all screen state, or take snap shots. If you intend to take only snapshots, then on both Windows and Unix, you're looking at 20 lines of code or less.
I noticed that your application allows the specification of font family, font face, etc., when scanning for text. This has implications on the scope of what you are doing. Naturally, you're not proposing to be able to recognize any piece of "graphical text", are you?
-Le Chaud Lapin-
Okay great so the project is feasible.
I see. Let me clarify for the other readers - you are _not_ proposing to recognize any arbitrary "text". It is possible to render text in Windows using a font not known by the system using bit-blt on internally rendered bitmap. Certainly you would not be able to recognize a custom "dingbat font".
That said, I can vouch for the value of your application (if it actually works. :])
Most macro recorders today are somewhat disfunctional. (Macro recorders monitor, in software, the input stream to Windows, and attempts to replay the stream back after the computer has been rebooted to get the application into the state it assumed when a human entered the input).
The reason that they are broken has primarily to do with timing - when it comes times to playback the keystrokes, they have no idea how rapidly the application is responding to playback. So the macro recorder might replay input too fast, playing keystrokes that were meant for a window that has not come alive yet. The keystrokes are then lost. The operator of the macro recorder will try to combat this by guessing how long it takes a Window to be "born" and "come to life", waiting that amount of time before playing input to that Window. But this is error prone. Raise or lower the CPU peformance, and it breaks.
So Peter's application apparently takes a snapshot of the screen, finds all the windows, finds the title bars in the windows, edit boxes, etc. and presents that data when it is requested by a function that wants it. In this case, the operator of the macro recorder will no longer have to guess how long it takes windows to pop up. He will simply say, "Wait until there is a window with "Google Talk" in the title bark".
If this is what you are doing, and you are not doing generalized OCR, which you said on your site you were not, then I am a bit puzzled, as it would be possible to do the same by intervention into the GUI subsystem of Windows. And unlike the frame grab method, where you watch the pixels and therefore cannot "keep up" with state transitions on the screen, you would know pretty much the "exact" state of the screen at all times.
-Le Chaud Lapin-
Not only is it feasible, but the more I think about it, the more I realize that you should do this in software. The reason is that, if you are already writing software that must exist on the host system, then you're already writing software on the host system, so you have control over what that system does.
There is no commercially available system where you cannot get as close to the hardware ports as possible in software. For example, on Windows, the keyboard hardware is controled by a keyboard driver (mone keyboard). Before a GUI window gets a pressed key, it must pass from this driver into another device driver called WIN32K.SYS. WIN32K.SYS gets the keys one by one in a RAW INPUT THREAD. So you have options. You can inject keystrokes using the Raw Input API, write a device driver (not trivial) to intercede KBHID.SYS and WIN32K.SYS, or write a driver that gets right up against the keyboard hardware and and feeds KBHID.SYS.
On Linux and other Unices, writing device driver borders on trivial compared to Windows, so you could do the same thing there.
I would seriously reconsider doing this in hardware, since the amount of effort to get 99% of your market is significantly less in software.
The ratio of material cost for hardware method vs software method is infinite.
-Le Chaud Lapin-
I can't tell what you are asking, if you are asking can my technology recognize text from screen shots, the answer is yes. Here are some more details:
Ok, so even though you're using a DFA in your algorithm, the overall model is still stochastic. I see in many places 100% recognition, which, naturally, makes anyone skeptical. To get 100% recognition of arbitrary text, you have to know a priori the Bezier sets of not only all font families currently known, but those that have yet to be made, which seems absurd.
I think you should be more clear about the effectiveness of your tools, as in how it works. Instead of saying, "it recognizes, say a bit more." Since you already have a patent, it does not hurt to be more complete in your description.
-Le Chaud Lapin-
Now that I have read your other posts I can more easily understand your question. My system can recognize any machine generated text. It must be provided with exactly what to look for, this is typically done by specifying one or more FontInstances: (a) Font Typeface Name (b) Point Size or PixelHeight (c) Style including (Bold, Italic, Underline, and StrikeOut) (d) Foreground Color and BackGround Color
...
The C language Xlib "SendEvent" protocol request does that, on systems where X Window System is in use. For a Perl version see
-jiw
No it is not stochastic at all, the whole process is completely deterministic.
You must provide it with the means for knowing the precise pixel pattern of every Glyph that it must recognize, this is typically done by specifying a FontInstance: (a) Font Typeface Name (b) Point Size or PixelHeight (c) Style including (Bold, Italic, Underline, and StrikeOut) (d) Foreground Color and BackGround Color
It can process many different FontInstances simultaneously. This part of the system is operational and fully tested. It can provide 100% accuracy on any FontInstance that is not inherently ambiguous. The default FontInstance for much of MS Windows, Tahoma, 8 point is processed with 100% accuracy. Simple Heuristics can be applied to get very close to 100% accuracy on most FontInstances.
Actually it could be set up to process all font families currently known. The simplest way to do this would be to build the DFA for the lower case vowels of every FontInstance in the colors of black on white. Then the text would be required to be transformed to black on white. Now it could quickly determine the correct FontInstance on its own, and then load up the appropriate full DFA(s). This assumes machine generated text that is not dithered or anti-aliased. With dithering, the problem of transforming the text to black on white becomes more complex, yet still feasible.
With those parameters, it is indeed possible to find matches. How could you not? If your software runs on the same computer as the windows that it is monitoring, then certainly if you render a piece of text using the parameters that match what is displayed, you will have an exact match, even with effects of anti-aliasing, transformations, etc.
However, I should point out again. Given that the user of your software has to specify these parameters anyway, and given that text that was not generated by the underlying font system will not, in general, be recognized by your software, it remains that the most important elements of recognition are pieces of text that is generated by the GUI system.
But it is possible to intercept _all_ rendering of such text through well-defined API's. In other words, if I were interested in knowing if there were a window that had the word "JFET" in it, I have to options.
Do you see? By interposition into the GUI subsystem, it becomes far easier to describe what you are looking for. Font face, point size, styling, and color become irrelevant, if it doesn't matter.
There is something else that is important. With your system, it seems that you are taking snapshots. The problem with snapshots is that there is a chance you will miss something, unless you are planning to bump up the rate of frame-grabbing so fast that you miss nothing. With my hypothetical system, there would never need to be a need to take a snapshot. You'd always know the state of the system.
-Le Chaud Lapin-
Ok, I see what you are doing now. I hate to rain on anyone's parade, especially one where the objective is ambitious, you should know that what you are doing, the ultimate result, could be done in a way that is probably superior in many respect than the image based method.
One example is simple. Let's say that a programmer wants to use your software to know whenever the string "You Have Mail" appears anywhere on the screen, knowing that there is a mail application that pops up a window with this message. He specifies the font family, point size, style, and background/foreground colors of the little window that contains this message. To get this information, he spends 10 minutes repeatedly sending mail messages to himself to force the window to popup, and when it does, he eyeballs the message to ascertain the parameters. Finally he goes to your software and enters arguments for these parameters. Then he tells your software to run, and specifies a rate-of-grab of frame buffers so that the window, which pops up for only three seconds, is not missed.
Compare that to not having to force anything to popup or eyeball anything, simple typing in "you have mail", checking case-insensitive box, and being done with it. Not rate-of-grab would be necessary because there would be no frame grabbing. The monitoring software would simply "know" the state of entire GUI system at any point in time.
Certainly you will agree that, if this is what your software does, the latter method has significant advantages?
-Le Chaud Lapin-
On a sunny day (Tue, 31 Oct 2006 19:24:17 -0600) it happened "Peter Olcott" wrote in :
I ma sure you can, but because of the large amoun tof stuff that potentially _can_ run, X11 (own drivers), text console (own drivers), vgalib, you'd first have to find what is running and how I think, before you can access any display buffer[s]. There may be more then one graphics card too :-)
My system is the only possible way that is inherently compatible with every system , platform, and application. There are many cases where the required information is unavailable from the system internals. My system handles all of those cases. Now that we have dual core machines, it is possible, using a DFA to process many screens very quickly. I expect that my system could even play and win fast paced video games.
And significant disadvantages, for example a false positive match. For something as simple as that, my system might be able to process as many as 100 frames per second. In fact that may be the biggest problem with the approach you are proposing over my method. Another problem is that there are times when this "text" message is displayed using a bitmap, rather than text itself.
While I don't actually 'like' the keyboard interface approach being asked for, these devices are readily available, and so it should be easy to test how it does all behave. Look at the Hagstrom KE72
PolyTech Forum website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.