** Python Support Thread ** - Page 27 - Computer Technical Help

Two Plus Two Forums Other Topics Computer and Technical Help

** Python Support Thread **

Post Reply Subscribe

...

Page 27 of 30

First

7 17 22 23 24 25 26 27 28 29 30

Last

Page 27 of 30

First

7 17 22 23 24 25 26 27 28 29 30

Last

07-26-2012 , 06:29 AM

#651

Alex Wice

aka Double Ice

Join Date: Jun 2007 Posts: 5,950

Elements of "tds" were not strings; they were bs4 tags (specifically: class 'bs4.element.Tag' ). Maybe someone else can pick up the slack and explain why you got the error, it seems like there is some sort of lazy evaluation in .string that interacts badly with print.

Anyways, try this and it should work.

Code:

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://www.timeanddate.com/worldclock/astronomy.html?n=78').read())

for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
  tds = row('td')
  tds_trunc = tds[0].string, tds[1].string
  print tds_trunc

Quote

150% up to $2,000 Welcome Bonus on CoinPoker

Join the action now

Daily Rewards • Splash Pots • CoinRaces

07-26-2012 , 07:15 AM

#652

LA_Price

adept

Join Date: Feb 2004 Posts: 1,119

HolidayintheSun,

Your code worked for me running python 2.7 on windows.

Code:

for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
...     tds = row('td')
...     print  tds[0].string, tds[1].string
...     
Jul 26, 2012 5:32 AM
Jul 27, 2012 5:34 AM
Jul 28, 2012 5:35 AM
Jul 29, 2012 5:37 AM
Jul 30, 2012 5:39 AM
Jul 31, 2012 5:40 AM
Aug 1, 2012 5:42 AM

I'd start running through the code line by line in an interpreter. For instance
after you import BeautifulSoup and urllib2 run the first line

Code:

>>> soup= BeautifulSoup(urllib2.urlopen('http://www.timeanddate.com/worldclock/astronomy.html?n=78').read())

Then just type

Code:

>>>soup

and you should get output like this

Code:

<!DOCTYPE html>

<!--

scripts and programs that download content transparent to the user are not allowed without permission

-->
<html lang="en">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<title>Sunrise and Sunset for Ireland â€“ Dublin â€“ coming days</title>

Quote

07-26-2012 , 12:20 PM

#653

HolidayInTheSun

old hand

Join Date: Dec 2011 Posts: 1,307

thanks for the very helpful replies. comments below:

Quote:

Originally Posted by Alex Wice

Code:

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://www.timeanddate.com/worldclock/astronomy.html?n=78').read())

for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
  tds = row('td')
  tds_trunc = tds[0].string, tds[1].string
  print tds_trunc

yes, that worked. i get:

Code:

for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
  tds = row('td')
  tds_trunc = tds[0].string, tds[1].string
  print tds_trunc

  
(u'Jul 26, 2012', u'5:32 AM')
(u'Jul 27, 2012', u'5:34 AM')
(u'Jul 28, 2012', u'5:35 AM')
(u'Jul 29, 2012', u'5:37 AM')
(u'Jul 30, 2012', u'5:39 AM')
(u'Jul 31, 2012', u'5:40 AM')
(u'Aug 1, 2012', u'5:42 AM')

however, this only works if i enter the code in multiple steps. i first tried to copy paste the entire thing (from import bs4 to print tds_trunc) and when i did that, nothing happened. that is, nothing was printed, and the next line returned to >>>. only when i entered it piece by piece (1st import bs4, then import urllib2, then define soup, then run the for loop)

Quote:

Originally Posted by LA_Price

HolidayintheSun,

Your code worked for me running python 2.7 on windows.

i'm using python 2.7.3 for windows 7 and it's still not working. i don't understand how that's possible? it's not calling anything from the hard drive, shouldn't the program function identically for both of us?

Quote:

I'd start running through the code line by line in an interpreter. For instance
after you import BeautifulSoup and urllib2 run the first line

Code:

>>> soup= BeautifulSoup(urllib2.urlopen('http://www.timeanddate.com/worldclock/astronomy.html?n=78').read())

Then just type

Code:

>>>soup

and you should get output like this

Code:

<!DOCTYPE html>

<!--

scripts and programs that download content transparent to the user are not allowed without permission

-->
<html lang="en">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<title>Sunrise and Sunset for Ireland â€“ Dublin â€“ coming days</title>

yeah that's what i tried doing last night. if i define soup, and then just type >>>soup, it prints out the html text which begins with the same text that you identify above. so it has to be something about

Code:

for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
  tds = row('td')
  print tds[0].string, tds[1].string
  # will print date and sunrise

cause up till then i think i'm okay, the program is doing what it should.

Quote

07-26-2012 , 06:08 PM

#654

LA_Price

adept

Join Date: Feb 2004 Posts: 1,119

Hmm, one really common beginners mistake is mess up python's treatment of spaces

Code:

for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
(should be 4 spaces here)tds = row('td')
(that's 4 hits of spacebar)tds_trunc = tds[0].string, tds[1].string
(should be 4 spaces here)print tds_trunc

Quote

07-26-2012 , 10:20 PM

#655

HolidayInTheSun

old hand

Join Date: Dec 2011 Posts: 1,307

hmm i tried that and i still get the same result. maybe it's cause i'm using IDLE? i'm so confused now.

Quote

07-27-2012 , 02:09 AM

#656

Xhad

Carpal \'Tunnel

Join Date: Jul 2005 Posts: 10,962

are you c/p'ing directly into the console? Or into a file in the editor?

Quote

07-27-2012 , 02:34 AM

#657

HolidayInTheSun

old hand

Join Date: Dec 2011 Posts: 1,307

i was trying to c/p directly into console.

also a piece of information i left out, but might be helpful:

if i type:

Code:

import urllib2
from bs4 import BeautifulSoup

all in one step, then beautiful soup does not import properly. i have to do them one by one in order for bs to actually import. i also tried indenting the second line by 4 spaces and it did not help.
-------------------------------------------------
i just tried to copy paste into the editor and then "run module." when i do this, i still get an error message

Code:

Traceback (most recent call last):
  File "C:\Python27\Scripts\sunrise.py", line 8, in <module>
    print tds[0].string, tds[1].string
  File "C:\Python27\lib\idlelib\rpc.py", line 595, in __call__
    value = self.sockio.remotecall(self.oid, self.name, args, kwargs)
  File "C:\Python27\lib\idlelib\rpc.py", line 210, in remotecall
    seq = self.asynccall(oid, methodname, args, kwargs)
  File "C:\Python27\lib\idlelib\rpc.py", line 225, in asynccall
    self.putmessage((seq, request))
  File "C:\Python27\lib\idlelib\rpc.py", line 324, in putmessage
    s = pickle.dumps(message)
  File "C:\Python27\lib\copy_reg.py", line 74, in _reduce_ex
    getstate = self.__getstate__
RuntimeError: maximum recursion depth exceeded

when i try to run

Code:

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://www.timeanddate.com/worldclock/astronomy.html?n=78').read())

for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
  tds = row('td')
  print tds[0].string, tds[1].string
  # will print date and sunrise

but when i run

Code:

 import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://www.timeanddate.com/worldclock/astronomy.html?n=78').read())

for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
  tds = row('td')
  tds_trunc = tds[0].string, tds[1].string
  print tds_trunc

this works just fine and prints the date along with the sunrise time.

Last edited by HolidayInTheSun; 07-27-2012 at 02:39 AM.

Quote

07-27-2012 , 02:59 PM

#658

Mariogs37

adept

Join Date: May 2012 Posts: 825

Hey guys,

I'm trying to get my program to read in text from a .txt file (it does a bunch of stuff with the text afterward but that's not the issue). Here's what I have:

Code:

def read_file():
    f = open(text.txt, "r")
    text = f.read()


def start():


    space_count = 0

    speed("fastest")

    color("red")

    width = input("How wide would you like each character to be?: ")

    home_turtle(0,0)
    



def main():


    start()

    read_file()

    for character in text.txt():

        assess_letter()

        forward_turtle()

    
main()

And here's the error I get:

"UnboundLocalError: local variable "text" referenced before assignment"

Thoughts on how to fix this?

Quote

07-27-2012 , 04:55 PM

#659

daveT

S.A.G.E. Master

Join Date: Jun 2005 Posts: 23,955

Your functions have to have a return value. Python doesn't have implicit returns, so you have to do be explicit to put the variables into the global space:

Code:

def read_file():
    f = open(text.txt, "r")
    text = f.read()
    return text

then you can call the value of the function like so:

Code:

myVariable = read_file()

print myVariable

So with that knowledge, you can do this:

Code:

def main():


    start()

    myText = read_file()

    for character in myText:

        assess_letter()

        forward_turtle()

The above isn't tested, but that should sort of work. But you might want to comment out the assess_letter() and forward_turtle() function and just put in print myText to a) see what the output looks like and b) prevent THAT error as well.

Quote

07-27-2012 , 08:58 PM

#660

Alex Wice

aka Double Ice

Join Date: Jun 2007 Posts: 5,950

Holiday you should be making a new *.py file and copy and pasting to that file, then running (F5) that file.

Quote

07-27-2012 , 09:13 PM

#661

Alex Wice

aka Double Ice

Join Date: Jun 2007 Posts: 5,950

Quote:

Originally Posted by Mariogs37

Hey guys,

I'm trying to get my program to read in text from a .txt file (it does a bunch of stuff with the text afterward but that's not the issue). Here's what I have:

Code:

def read_file():
    f = open(text.txt, "r")
    text = f.read()


def start():


    space_count = 0

    speed("fastest")

    color("red")

    width = input("How wide would you like each character to be?: ")

    home_turtle(0,0)
    



def main():


    start()

    read_file()

    for character in text.txt():

        assess_letter()

        forward_turtle()

    
main()

And here's the error I get:

"UnboundLocalError: local variable "text" referenced before assignment"

Thoughts on how to fix this?

Yes, text is a local variable to read_file. So when you said "text.txt()", this doesn't have any meaning at all....

Honestly I think it's totally okay if your files are global. So, like...

Code:

fi = open('text.txt','r')

def start():
    #stuff goes here

def main():
    start()
    for cx in fi:
       assess_letter()
       forward_turtle()

Quote

08-06-2012 , 07:56 PM

#662

Mariogs37

adept

Join Date: May 2012 Posts: 825

Hey guys,

So I've written a program that prompts the user for a letter and then records the number of times each letter is input in a list. So list[1] is the number of times A is input, list[2] for B, etc.

I want the last part of my program to tell me what the most common letter was (or letters if several letters are input the same number of times). This is the part I'm struggling with.

I thought of creating a list like:

Code:

most_common = [0] * 27 #I'm not using the first entry spot

max_occurrences = 0

for letter in range(1, 27):
      if list[letter] == max_occurrences:
            most_common[letter] = list[letter]
      elif list[letter] > max_occurrences:
            most_common = [0] * 27
            most_common[letter] = list[letter]

#Then I'll have some code that looks at each element in my most_common list and prints letters for each spot in the list that isn't 0 (these should be all the numbers that are maxes).

Last edited by Mariogs37; 08-06-2012 at 07:58 PM. Reason: Should be in the python support thread, sorry about that

Quote

08-06-2012 , 09:05 PM

#663

jukofyork

Carpal \'Tunnel

Join Date: Sep 2004 Posts: 11,749

Quote:

Should be in the python support thread, sorry about that

I merged it in for you.

Juk

Quote

08-06-2012 , 09:15 PM

#664

jukofyork

Carpal \'Tunnel

Join Date: Sep 2004 Posts: 11,749

Code:

most_common = [0] * 26

max_occurrences = -1
num_max = 0

for letter in range(1, 27):
      if list[letter] > max_occurrences:
            max_occurrences = list[letter]
            most_common[0] = letter
            num_max = 1
      elif list[letter] == max_occurrences:
            most_common[num_max] = letter
            num_max = num_max + 1

At the end of this code you should have a list (vector?) of indexes with the first num_max filled, eg:

num_max=3, max_occurrences=6, most_common={1,4,26}

on exit would tell you that there are 3 letters with the max count, the max count is 6 occurrences and the most common letters were A, D and Z.

It might not be quite syntactically correct as I don't use Python, but hopefully you should get the idea

An alternative method would be to use a list/vector of tuples (letter/frequency) and then sort them in descending order letting you see the most frequent at the start of the sorted list.

Juk

Quote

08-06-2012 , 10:26 PM

#665

daveT

S.A.G.E. Master

Join Date: Jun 2005 Posts: 23,955

Are you required to use list because this is a homework assignment? If it's a self-study project, maybe you should look into using a dictionary.

Quote

08-06-2012 , 11:47 PM

#666

Mariogs37

adept

Join Date: May 2012 Posts: 825

Yeah, required to use list.

Quote

08-07-2012 , 04:25 PM

#667

Mariogs37

adept

Join Date: May 2012 Posts: 825

So actually, I figured out how to do this. Thing is, I want the program to read in text until it reads "!", at which point it stops. I'm using a list to keep track of these but I'm not sure how to get the program to prompt the user for more text if it reads through all of the characters the user inputs and doesn't hit "!".

Here's my code so far:

Code:

list = [0] * 27

text = input("Please input some text: ")

for character in text:

list = [0] * 27

text = input("Please input some text: ")

for character in text:
    if character != "!":
        if ord[character]-65 < 0 or ord[character] > 25:
           list[26] += 1
        else:
           list[ord[character]-65] += 1

Quote

08-07-2012 , 04:58 PM

#668

theOnlyMoment

newbie

Join Date: May 2012 Posts: 42

Quote:

Originally Posted by Mariogs37

Code:

list = [0] * 27

text = input("Please input some text: ")

for character in text:

list = [0] * 27

text = input("Please input some text: ")

for character in text:
    if character != "!":
        if ord[character]-65 < 0 or ord[character] > 25:
           list[26] += 1
        else:
           list[ord[character]-65] += 1

Sorry not entirely sure what you're trying to do but for one thing if you want the program to continue asking the user for input then you'll need a while loop

Quote

08-10-2012 , 06:48 AM

#669

Alex Wice

aka Double Ice

Join Date: Jun 2007 Posts: 5,950

Code:

charlist = [0 for i in xrange(27)]
while True:
    inputtext = raw_input("Please input some text: ")
    for character in inputtext:
        av = ord(character)
        if av >= 65 and av <= 90:
            charlist[ord(character)-64] += 1
        elif av >= 97 and av <= 122:
            charlist[ord(character)-96] += 1
        elif av==33:
            break
    if av==33: break

maxi = max(charlist)
print 'num_max =',maxi
print 'max_occur =',charlist.count(maxi)
print 'most_common =',map(lambda x: chr(x+64), filter(lambda x: charlist[x]==maxi, range(27)))

I don't know all the terms of your assignment, so I just did this the way that I would actually code it.

Okay now lets study this program.

First we held open a character list. I would seed it by 0 but since you wanted to seed it so that a=1, that is fine too.

We knew we would have to keep asking the user for text until we knew to stop, so we started with a while loop. Next we took raw input. We did not use input( ..) because it was not guaranteed to be of type string, which could hurt us in comparison later (for example, "for character in inputtext" may not work because inputtext is not an iterable.)

Now we looked at each character in the input text. This was the right approach and you did a good job. Had you looked for the number of "a"s in inputtext, then followed by the number of "b"s, etc. you would have made a quadratic number of comparisons instead of a linear amount.

So, for each character we stored the ordinal number in a variable "av", so we would not have to look it up each time. We then checked if it was in the range [65,90] which would make it a capital letter from A-Z. If it was, we ticked up our charlist. We did the same in checking for lowercase. Finally, we knew the ordinal number of "!" would be ord("!") == 33, so if that were true, we stopped looking at letters immediately. (So if you type xyzz!aaaaaaa, "a" wont be most common.)

Finally, we checked if the residue on "av" was still 33. This is not that great of a practice but for something like this IMO it is fine. If and only if we saw a "!" (if character == "!"), the control structure would break out at the "elif av==33: break" part, and then it would immediately break out again at "if av==33: break".

Now we come to reporting the result. The num_max is simply going to be the highest number in our tally (namely, charlist). The number of times the max occurs is going to be charlist.count(maxi) -- this just counts how many times maxi was seen. The last one is tricky, so lets look at it in two steps:

First, we want a list of indices that represent the letters that are most common. For example if A, B, and D are most common, we want a list of [1,2,4]. The appropriate code for this is "filter(lambda x: charlist[x]==maxi, range(27))". What that does is, it takes a list [0,1,2,...,26], and it only keeps the elements of the list x for which charlist[x] == maxi -- namely, that it was a maximum.

Secondly, we have this list (eg. nicelist = [1,2,4]) and we want to get to ['A','B','D']. The correct code is going to be map(lambda x: chr(x+64), nicelist). What that does is, it goes one by one down nicelist and it changes every element x to chr(x+64). Since each element can only be a number from 1 to 26, it will change everything to one of chr(65) = 'A', chr(66) = 'B', etc. up to chr(90) = 'Z'.

Putting it together, we get this chunky line "map(lambda x: chr(x+64), filter(lambda x: charlist[x]==maxi, range(27)))".

If list comprehensions are new to you, you can iterate through the list and use the same ideas.

Code:

common = []
for i in xrange(len(charlist)):
    if charlist[i]==maxi: common.append(chr(64+i))
print common

I hope that helps, if you told me the assignment more maybe I can help you more simply.

Quote

10-20-2012 , 07:01 PM

#670

fluorescenthippo

Pooh-Bah

Join Date: Apr 2005 Posts: 5,849

im trying to log into forums using python and getting stuck. I heard mechanize might work but i suck too much at programming to figure it out.

i am writing a program to go to my subscriptions threads for this and other forums and open all the new threads in new windows. I have it all working except the log in part. so with one click per forum i get all the new threads opened up

Quote

02-19-2013 , 10:58 PM

#671

Lavon Affair

banned

Join Date: Oct 2012 Posts: 537

could somebody walk me through a web scrape with BeautifulSoup?

i am trying to scrape the regular season table from here http://www.basketball-reference.com/.../2011/gamelog/.

so i have

Code:

from bs4 import BeautifulSoup
import urllib2
      
url = 'http://www.basketball-reference.com/teams/BOS/2011/gamelog/'
soup = BeautifulSoup(urllib2.urlopen(url).read())

I am having trouble understanding how to import the data I want from the table. I have some idea of what I need to do from looking at BeautifulSoup tutorials and examples but don't really grasp everything.

By inspecting the table on the website I can see that it is named "sortable stats_table" and that the data is nested under

Quote:

<tbody>

then

Quote:

the data for the opponent is

Code:

<td align="left">MIA</td>

the result of the game

Quote:

and the rest of the data is like this

Code:

<td align="right">240</td>

First off since I know it is the first table can I just use ('table')[0] or do I have to use it's name? Then I am really confused on pulling the data out of the table. Do I have to differentiate between the data in <td align="left", "right", "center" or can I just I grab all of the data without doing that?

Quote

02-19-2013 , 11:23 PM

#672

tyler_cracker

Carpal \'Tunnel

Join Date: Apr 2005 Posts: 15,828

i recommend using a debugger (or print statements) to understand how beautifulsoup models the html. once you've got that, extracting the parts you need will be a piece of cake.

Quote

02-20-2013 , 12:13 AM

#673

Lavon Affair

banned

Join Date: Oct 2012 Posts: 537

Code:

for row in soup('table')[0].tbody('tr'):
    tds = row('td')
    
    print tds

So that code returns all of the table but with the HTML tags included. How do I strip out just the data?

Quote

02-20-2013 , 08:12 PM

#674

kerowo

lolcat

Join Date: Nov 2005 Posts: 37,183

I'm sure there is a mind boggling complex regex that will filter those out for you. To the google!

Elapsed time 32.5 seconds: http://www.pagecolumn.com/tool/all_about_html_tags.htm