Dev102 – Needle in a Haystack

Hello World… got deleted!

So here is my 1st real post in my Blog.

This is the response to “A Programming Job Interview Challenge #8 – A Needle in a Haystack

The question was about designing a queue that will create an alert when a certain pattern is found within the input. There are some more minor constrains, which you can check in the original post(Main constrain is that you cannot store the recieved data).

I am writing a C# implementation for this.

Explanation of the code:

The calls requires a pattern in the constructor. The pattern is then stored internally for further reference.

The main problem in this task is that we need to keep track of all possible matches that could happen. Take this pattern  “A A A B” and this input “A A A A A B”. When we get the 1st A we could have a possible match. But when we get to the 4th A, the match failed. But the match that started from the 2nd and 3rd A are still in a valid state. So we have to treat each letter as a potential start for a match. And each potential match has to be evaluated separate. To keep the state of each possible match you could use all kinds of possible datastructures.  I choose a generic list of enumerators. It is not the smallest possible structure I could have picked, but i think it is quite nice (you could also use an array of ints and increment each field as you need to match the next value…).

When a new chunk of data comes in, it will fetch an enumerator from the internal “pattern list”, and moves it onto the 1st position. Then it will add it to a 2nd internal List. This list will hold all “active” tests. After that is done, the class will try to validate all active pattern matches. If the match was OK, it will move the current enumerator to the next item to test. We have a positive match if we cannot move the enumerator, since we matched all items in the list.

If we fail to match a test, then we quite simply discard the current enumerator.

The maximum number of “concurrent tests” is limited to the size of the pattern. So for a 4-Element Pattern we would get a maximum of 4 concurrent tests.

The receive Function is not thread save, since we expect a lot of “sequential” input. The handling should be left to the calling function, or a thread save wrapper.

 

Example pattern: foo bar foo foo

Example input: foo bar foo foo bar foo foo

Result: 4-foo & 7-foo

Code:

    internal class MatchingQueue<T> where T : IEquatable<T>
    {
        private readonly List<T> target;
        private readonly List<IEnumerator<T>> currentMatcher = new List<IEnumerator<T>>();
        private int counter = 0;
        public event EventHandler<MatchingEventArgs> MatchFound;
 
        public MatchingQueue(T[] targetSequence)
        {
            if (targetSequence.GetUpperBound(0)<0)
                throw new InvalidDataException(“Need at least 1 entry in order to match anything….”);
            target = new List<T>(targetSequence);
        }
 
        public void Receive(T data)
        {
            counter++;
           
            IEnumerator<T> newTest = target.GetEnumerator();
            currentMatcher.Add(newTest);
            newTest.MoveNext();
            foreach (IEnumerator<T> tester in currentMatcher.ToArray())
            {
               if (tester.Current.Equals(data))
               {
                   if (!tester.MoveNext())
                   {
                       //No more tests… We have to report and remove
                       Report(data);
                       currentMatcher.Remove(tester);
                   }
                      
               }
               else
                   //failed to match… Kick it
                   currentMatcher.Remove(tester);
            }
            //TODO: Foreward it to next Queue.
        }
 
 
        private void Report(T data)
        {
            EventHandler<MatchingEventArgs> temp = MatchFound;
            if (temp != null)
                temp(this, new MatchingEventArgs(data,counter));
        }
 
        public class MatchingEventArgs : EventArgs
        {
            private readonly T match;
            private readonly int counter;
            public MatchingEventArgs(T match, int counter)
            {
                this.match = match;
                this.counter = counter;
            }
            public int Counter
            {
                get { return counter; }
            }
            public T Message
            {
                get { return match; }
            }
        }
    }

~ by Heiko Hatzfeld on June 23, 2008.

One Response to “Dev102 – Needle in a Haystack”

  1. [...] http://roadrunner74.wordpress.com/2008/06/23/dev102-needle-in-a-haystack/ (Heiko Hatzfeld). [...]

Leave a Reply