Open Side Menu Go to the Top

07-26-2012 , 06:29 AM
Elements of "tds" were not strings; they were bs4 tags (specifically: class 'bs4.element.Tag' ). Maybe someone else can pick up the slack and explain why you got the error, it seems like there is some sort of lazy evaluation in .string that interacts badly with print.

Anyways, try this and it should work.

Code:
import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://www.timeanddate.com/worldclock/astronomy.html?n=78').read())

for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
  tds = row('td')
  tds_trunc = tds[0].string, tds[1].string
  print tds_trunc
** Python Support Thread ** Quote
** Python Support Thread **
150% up to $2,000 Welcome Bonus on CoinPoker
Join the action now
Daily Rewards • Splash Pots • CoinRaces
** Python Support Thread **
07-26-2012 , 07:15 AM
HolidayintheSun,

Your code worked for me running python 2.7 on windows.

Code:
for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
...     tds = row('td')
...     print  tds[0].string, tds[1].string
...     
Jul 26, 2012 5:32 AM
Jul 27, 2012 5:34 AM
Jul 28, 2012 5:35 AM
Jul 29, 2012 5:37 AM
Jul 30, 2012 5:39 AM
Jul 31, 2012 5:40 AM
Aug 1, 2012 5:42 AM
I'd start running through the code line by line in an interpreter. For instance
after you import BeautifulSoup and urllib2 run the first line

Code:
>>> soup= BeautifulSoup(urllib2.urlopen('http://www.timeanddate.com/worldclock/astronomy.html?n=78').read())
Then just type

Code:
>>>soup
and you should get output like this

Code:
<!DOCTYPE html>

<!--

scripts and programs that download content transparent to the user are not allowed without permission

-->
<html lang="en">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<title>Sunrise and Sunset for Ireland – Dublin – coming days</title>
** Python Support Thread ** Quote
07-26-2012 , 12:20 PM
thanks for the very helpful replies. comments below:

Quote:
Originally Posted by Alex Wice
Elements of "tds" were not strings; they were bs4 tags (specifically: class 'bs4.element.Tag' ). Maybe someone else can pick up the slack and explain why you got the error, it seems like there is some sort of lazy evaluation in .string that interacts badly with print.

Anyways, try this and it should work.

Code:
import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://www.timeanddate.com/worldclock/astronomy.html?n=78').read())

for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
  tds = row('td')
  tds_trunc = tds[0].string, tds[1].string
  print tds_trunc
yes, that worked. i get:
Code:
for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
  tds = row('td')
  tds_trunc = tds[0].string, tds[1].string
  print tds_trunc

  
(u'Jul 26, 2012', u'5:32 AM')
(u'Jul 27, 2012', u'5:34 AM')
(u'Jul 28, 2012', u'5:35 AM')
(u'Jul 29, 2012', u'5:37 AM')
(u'Jul 30, 2012', u'5:39 AM')
(u'Jul 31, 2012', u'5:40 AM')
(u'Aug 1, 2012', u'5:42 AM')
however, this only works if i enter the code in multiple steps. i first tried to copy paste the entire thing (from import bs4 to print tds_trunc) and when i did that, nothing happened. that is, nothing was printed, and the next line returned to >>>. only when i entered it piece by piece (1st import bs4, then import urllib2, then define soup, then run the for loop)

Quote:
Originally Posted by LA_Price
HolidayintheSun,

Your code worked for me running python 2.7 on windows.
i'm using python 2.7.3 for windows 7 and it's still not working. i don't understand how that's possible? it's not calling anything from the hard drive, shouldn't the program function identically for both of us?


Quote:
I'd start running through the code line by line in an interpreter. For instance
after you import BeautifulSoup and urllib2 run the first line

Code:
>>> soup= BeautifulSoup(urllib2.urlopen('http://www.timeanddate.com/worldclock/astronomy.html?n=78').read())
Then just type

Code:
>>>soup
and you should get output like this

Code:
<!DOCTYPE html>

<!--

scripts and programs that download content transparent to the user are not allowed without permission

-->
<html lang="en">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<title>Sunrise and Sunset for Ireland – Dublin – coming days</title>
yeah that's what i tried doing last night. if i define soup, and then just type >>>soup, it prints out the html text which begins with the same text that you identify above. so it has to be something about
Code:
for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
  tds = row('td')
  print tds[0].string, tds[1].string
  # will print date and sunrise
cause up till then i think i'm okay, the program is doing what it should.
** Python Support Thread ** Quote
07-26-2012 , 06:08 PM
Hmm, one really common beginners mistake is mess up python's treatment of spaces

Code:
for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
(should be 4 spaces here)tds = row('td')
(that's 4 hits of spacebar)tds_trunc = tds[0].string, tds[1].string
(should be 4 spaces here)print tds_trunc
** Python Support Thread ** Quote
07-26-2012 , 10:20 PM
hmm i tried that and i still get the same result. maybe it's cause i'm using IDLE? i'm so confused now.
** Python Support Thread ** Quote
07-27-2012 , 02:09 AM
are you c/p'ing directly into the console? Or into a file in the editor?
** Python Support Thread ** Quote
07-27-2012 , 02:34 AM
i was trying to c/p directly into console.

also a piece of information i left out, but might be helpful:

if i type:

Code:
import urllib2
from bs4 import BeautifulSoup
all in one step, then beautiful soup does not import properly. i have to do them one by one in order for bs to actually import. i also tried indenting the second line by 4 spaces and it did not help.
-------------------------------------------------
i just tried to copy paste into the editor and then "run module." when i do this, i still get an error message
Code:
Traceback (most recent call last):
  File "C:\Python27\Scripts\sunrise.py", line 8, in <module>
    print tds[0].string, tds[1].string
  File "C:\Python27\lib\idlelib\rpc.py", line 595, in __call__
    value = self.sockio.remotecall(self.oid, self.name, args, kwargs)
  File "C:\Python27\lib\idlelib\rpc.py", line 210, in remotecall
    seq = self.asynccall(oid, methodname, args, kwargs)
  File "C:\Python27\lib\idlelib\rpc.py", line 225, in asynccall
    self.putmessage((seq, request))
  File "C:\Python27\lib\idlelib\rpc.py", line 324, in putmessage
    s = pickle.dumps(message)
  File "C:\Python27\lib\copy_reg.py", line 74, in _reduce_ex
    getstate = self.__getstate__
RuntimeError: maximum recursion depth exceeded
when i try to run
Code:
import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://www.timeanddate.com/worldclock/astronomy.html?n=78').read())

for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
  tds = row('td')
  print tds[0].string, tds[1].string
  # will print date and sunrise
but when i run
Code:
 import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://www.timeanddate.com/worldclock/astronomy.html?n=78').read())

for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
  tds = row('td')
  tds_trunc = tds[0].string, tds[1].string
  print tds_trunc
this works just fine and prints the date along with the sunrise time.

Last edited by HolidayInTheSun; 07-27-2012 at 02:39 AM.
** Python Support Thread ** Quote
07-27-2012 , 02:59 PM
Hey guys,

I'm trying to get my program to read in text from a .txt file (it does a bunch of stuff with the text afterward but that's not the issue). Here's what I have:

Code:
def read_file():
    f = open(text.txt, "r")
    text = f.read()


def start():


    space_count = 0

    speed("fastest")

    color("red")

    width = input("How wide would you like each character to be?: ")

    home_turtle(0,0)
    



def main():


    start()

    read_file()

    for character in text.txt():

        assess_letter()

        forward_turtle()

    
main()
And here's the error I get:

"UnboundLocalError: local variable "text" referenced before assignment"

Thoughts on how to fix this?
** Python Support Thread ** Quote
07-27-2012 , 04:55 PM
Your functions have to have a return value. Python doesn't have implicit returns, so you have to do be explicit to put the variables into the global space:

Code:
def read_file():
    f = open(text.txt, "r")
    text = f.read()
    return text
then you can call the value of the function like so:

Code:
myVariable = read_file()

print myVariable
So with that knowledge, you can do this:

Code:
def main():


    start()

    myText = read_file()

    for character in myText:

        assess_letter()

        forward_turtle()
The above isn't tested, but that should sort of work. But you might want to comment out the assess_letter() and forward_turtle() function and just put in print myText to a) see what the output looks like and b) prevent THAT error as well.
** Python Support Thread ** Quote
07-27-2012 , 08:58 PM
Holiday you should be making a new *.py file and copy and pasting to that file, then running (F5) that file.
** Python Support Thread ** Quote
07-27-2012 , 09:13 PM
Quote:
Originally Posted by Mariogs37
Hey guys,

I'm trying to get my program to read in text from a .txt file (it does a bunch of stuff with the text afterward but that's not the issue). Here's what I have:

Code:
def read_file():
    f = open(text.txt, "r")
    text = f.read()


def start():


    space_count = 0

    speed("fastest")

    color("red")

    width = input("How wide would you like each character to be?: ")

    home_turtle(0,0)
    



def main():


    start()

    read_file()

    for character in text.txt():

        assess_letter()

        forward_turtle()

    
main()
And here's the error I get:

"UnboundLocalError: local variable "text" referenced before assignment"

Thoughts on how to fix this?
Yes, text is a local variable to read_file. So when you said "text.txt()", this doesn't have any meaning at all....

Honestly I think it's totally okay if your files are global. So, like...

Code:
fi = open('text.txt','r')

def start():
    #stuff goes here

def main():
    start()
    for cx in fi:
       assess_letter()
       forward_turtle()
** Python Support Thread ** Quote
08-06-2012 , 07:56 PM
Hey guys,

So I've written a program that prompts the user for a letter and then records the number of times each letter is input in a list. So list[1] is the number of times A is input, list[2] for B, etc.

I want the last part of my program to tell me what the most common letter was (or letters if several letters are input the same number of times). This is the part I'm struggling with.

I thought of creating a list like:

Code:
most_common = [0] * 27 #I'm not using the first entry spot

max_occurrences = 0

for letter in range(1, 27):
      if list[letter] == max_occurrences:
            most_common[letter] = list[letter]
      elif list[letter] > max_occurrences:
            most_common = [0] * 27
            most_common[letter] = list[letter]

#Then I'll have some code that looks at each element in my most_common list and prints letters for each spot in the list that isn't 0 (these should be all the numbers that are maxes).

Last edited by Mariogs37; 08-06-2012 at 07:58 PM. Reason: Should be in the python support thread, sorry about that
** Python Support Thread ** Quote
08-06-2012 , 09:05 PM
Quote:
Should be in the python support thread, sorry about that
I merged it in for you.

Juk
** Python Support Thread ** Quote
08-06-2012 , 09:15 PM
Code:
most_common = [0] * 26

max_occurrences = -1
num_max = 0

for letter in range(1, 27):
      if list[letter] > max_occurrences:
            max_occurrences = list[letter]
            most_common[0] = letter
            num_max = 1
      elif list[letter] == max_occurrences:
            most_common[num_max] = letter
            num_max = num_max + 1
At the end of this code you should have a list (vector?) of indexes with the first num_max filled, eg:

num_max=3, max_occurrences=6, most_common={1,4,26}

on exit would tell you that there are 3 letters with the max count, the max count is 6 occurrences and the most common letters were A, D and Z.

It might not be quite syntactically correct as I don't use Python, but hopefully you should get the idea

An alternative method would be to use a list/vector of tuples (letter/frequency) and then sort them in descending order letting you see the most frequent at the start of the sorted list.

Juk
** Python Support Thread ** Quote
08-06-2012 , 10:26 PM
Are you required to use list because this is a homework assignment? If it's a self-study project, maybe you should look into using a dictionary.
** Python Support Thread ** Quote
08-06-2012 , 11:47 PM
Yeah, required to use list.
** Python Support Thread ** Quote
08-07-2012 , 04:25 PM
So actually, I figured out how to do this. Thing is, I want the program to read in text until it reads "!", at which point it stops. I'm using a list to keep track of these but I'm not sure how to get the program to prompt the user for more text if it reads through all of the characters the user inputs and doesn't hit "!".

Here's my code so far:

Code:
list = [0] * 27

text = input("Please input some text: ")

for character in text:

list = [0] * 27

text = input("Please input some text: ")

for character in text:
    if character != "!":
        if ord[character]-65 < 0 or ord[character] > 25:
           list[26] += 1
        else:
           list[ord[character]-65] += 1
** Python Support Thread ** Quote
08-07-2012 , 04:58 PM
Quote:
Originally Posted by Mariogs37
So actually, I figured out how to do this. Thing is, I want the program to read in text until it reads "!", at which point it stops. I'm using a list to keep track of these but I'm not sure how to get the program to prompt the user for more text if it reads through all of the characters the user inputs and doesn't hit "!".

Here's my code so far:

Code:
list = [0] * 27

text = input("Please input some text: ")

for character in text:

list = [0] * 27

text = input("Please input some text: ")

for character in text:
    if character != "!":
        if ord[character]-65 < 0 or ord[character] > 25:
           list[26] += 1
        else:
           list[ord[character]-65] += 1
Sorry not entirely sure what you're trying to do but for one thing if you want the program to continue asking the user for input then you'll need a while loop
** Python Support Thread ** Quote
08-10-2012 , 06:48 AM
Code:
charlist = [0 for i in xrange(27)]
while True:
    inputtext = raw_input("Please input some text: ")
    for character in inputtext:
        av = ord(character)
        if av >= 65 and av <= 90:
            charlist[ord(character)-64] += 1
        elif av >= 97 and av <= 122:
            charlist[ord(character)-96] += 1
        elif av==33:
            break
    if av==33: break

maxi = max(charlist)
print 'num_max =',maxi
print 'max_occur =',charlist.count(maxi)
print 'most_common =',map(lambda x: chr(x+64), filter(lambda x: charlist[x]==maxi, range(27)))
I don't know all the terms of your assignment, so I just did this the way that I would actually code it.

Okay now lets study this program.

First we held open a character list. I would seed it by 0 but since you wanted to seed it so that a=1, that is fine too.

We knew we would have to keep asking the user for text until we knew to stop, so we started with a while loop. Next we took raw input. We did not use input( ..) because it was not guaranteed to be of type string, which could hurt us in comparison later (for example, "for character in inputtext" may not work because inputtext is not an iterable.)

Now we looked at each character in the input text. This was the right approach and you did a good job. Had you looked for the number of "a"s in inputtext, then followed by the number of "b"s, etc. you would have made a quadratic number of comparisons instead of a linear amount.

So, for each character we stored the ordinal number in a variable "av", so we would not have to look it up each time. We then checked if it was in the range [65,90] which would make it a capital letter from A-Z. If it was, we ticked up our charlist. We did the same in checking for lowercase. Finally, we knew the ordinal number of "!" would be ord("!") == 33, so if that were true, we stopped looking at letters immediately. (So if you type xyzz!aaaaaaa, "a" wont be most common.)

Finally, we checked if the residue on "av" was still 33. This is not that great of a practice but for something like this IMO it is fine. If and only if we saw a "!" (if character == "!"), the control structure would break out at the "elif av==33: break" part, and then it would immediately break out again at "if av==33: break".

Now we come to reporting the result. The num_max is simply going to be the highest number in our tally (namely, charlist). The number of times the max occurs is going to be charlist.count(maxi) -- this just counts how many times maxi was seen. The last one is tricky, so lets look at it in two steps:

First, we want a list of indices that represent the letters that are most common. For example if A, B, and D are most common, we want a list of [1,2,4]. The appropriate code for this is "filter(lambda x: charlist[x]==maxi, range(27))". What that does is, it takes a list [0,1,2,...,26], and it only keeps the elements of the list x for which charlist[x] == maxi -- namely, that it was a maximum.

Secondly, we have this list (eg. nicelist = [1,2,4]) and we want to get to ['A','B','D']. The correct code is going to be map(lambda x: chr(x+64), nicelist). What that does is, it goes one by one down nicelist and it changes every element x to chr(x+64). Since each element can only be a number from 1 to 26, it will change everything to one of chr(65) = 'A', chr(66) = 'B', etc. up to chr(90) = 'Z'.

Putting it together, we get this chunky line "map(lambda x: chr(x+64), filter(lambda x: charlist[x]==maxi, range(27)))".


If list comprehensions are new to you, you can iterate through the list and use the same ideas.

Code:
common = []
for i in xrange(len(charlist)):
    if charlist[i]==maxi: common.append(chr(64+i))
print common
I hope that helps, if you told me the assignment more maybe I can help you more simply.
** Python Support Thread ** Quote
10-20-2012 , 07:01 PM
im trying to log into forums using python and getting stuck. I heard mechanize might work but i suck too much at programming to figure it out.

i am writing a program to go to my subscriptions threads for this and other forums and open all the new threads in new windows. I have it all working except the log in part. so with one click per forum i get all the new threads opened up
** Python Support Thread ** Quote
02-19-2013 , 10:58 PM
could somebody walk me through a web scrape with BeautifulSoup?

i am trying to scrape the regular season table from here http://www.basketball-reference.com/.../2011/gamelog/.

so i have
Code:
from bs4 import BeautifulSoup
import urllib2
      
url = 'http://www.basketball-reference.com/teams/BOS/2011/gamelog/'
soup = BeautifulSoup(urllib2.urlopen(url).read())
I am having trouble understanding how to import the data I want from the table. I have some idea of what I need to do from looking at BeautifulSoup tutorials and examples but don't really grasp everything.

By inspecting the table on the website I can see that it is named "sortable stats_table" and that the data is nested under
Quote:
<tbody>
then
Quote:
<tr class data-row="0'>
the data for the opponent is
Code:
<td align="left">MIA</td>
the result of the game
Quote:
<td align="center">W</td>
and the rest of the data is like this
Code:
<td align="right">240</td>
First off since I know it is the first table can I just use ('table')[0] or do I have to use it's name? Then I am really confused on pulling the data out of the table. Do I have to differentiate between the data in <td align="left", "right", "center" or can I just I grab all of the data without doing that?
** Python Support Thread ** Quote
02-19-2013 , 11:23 PM
i recommend using a debugger (or print statements) to understand how beautifulsoup models the html. once you've got that, extracting the parts you need will be a piece of cake.
** Python Support Thread ** Quote
02-20-2013 , 12:13 AM
Code:
for row in soup('table')[0].tbody('tr'):
    tds = row('td')
    
    print tds

So that code returns all of the table but with the HTML tags included. How do I strip out just the data?
** Python Support Thread ** Quote
02-20-2013 , 08:12 PM
I'm sure there is a mind boggling complex regex that will filter those out for you. To the google!

Elapsed time 32.5 seconds: http://www.pagecolumn.com/tool/all_about_html_tags.htm
** Python Support Thread ** Quote
02-20-2013 , 11:55 PM
why would you take a guy working on the right path toward a solution and drop him in the dark forest of regex?

boothisman.gif
** Python Support Thread ** Quote
** Python Support Thread **
150% up to $2,000 Welcome Bonus on CoinPoker
Join the action now
Daily Rewards • Splash Pots • CoinRaces
** Python Support Thread **

      
m