18:28:02 (C): jpez: go!
18:28:01 (C): jpez: and it's in python
18:25:49 (C): jpez: would all match
18:25:46 (C): jpez: bob smith is 1337
18:25:39 (C): jpez: bob is 1337
18:25:33 (C): jpez: smith is 1337
18:25:27 (C): jpez: so
18:25:24 (C): jpez: and i would like to find strings where it's "subject is *"
18:25:02 (C): jpez: so bob smith for example
18:24:56 (C): jpez: also number of words
18:24:38 (C): jpez: i have a subject of arbitrary length and number of characters
18:24:27 (C): jpez: ok

this won't handle generic whitespaciness, so I'm assuming you have formatted input. otherwise, use re.split(thing,'\s+') or something of that sort. you have to import re. the arguments may be in the wrong order, but you can google it.

this will handle "Bob Smith is 1337", "Bob is 1337" and "Smith is 1337" if subject is "Bob Smith" but not "Bobby Smith is 1337" if the subject is "Bobby Tom Smith"
if you want to be able to do that, you need loop


def jpezFunction(subject, sentence):
 firstHalf = sentence.split(" is ")[0]
 return firstHalf in subject

the smarter version follows


def jpezFunction(subject, sentence):
 subjects = subject.split(" ")
 firstHalf = sentence.split(" is ")[0]
 possibleSubjects = firstHalf.split(" ")
 if firstHalf in subject:
  return True
  for ps in possibleSubjects:
   if ps not in subjects:
    return False
 return True

here's the list comprehension version, which is much cooler. you can make it faster if you pull those splits out and make variables from them, but one line functions look cooler


def jpezFunction(subject, sentence):
 return False not in [(ps in subject.split(" ") for ps in sentence.split(" is ")[0].split(" ")]
Python 3.0 (should work the same in 2.5/2.6 with some slight tweaks)

def jpezFunc(sentence, subject, verb='is'):
    import re
    p = ""
    for s in subject.split(' '):
        p += r"({0}\s+)?".format(s)
    p += r'{0}(?:\s+.*)?'.format(verb)
    m = re.compile(p).match(sentence)
    if not m or (len(m.groups()) - m.groups().count(None)) == 0:
        return False
    return True

Note that the heavy lifting is going to be in heavily optimized C - meaning it'll be fast.
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum