In my last post, I said that using Python and Ruby were great alternatives to a calculator. I even provided some examples of how you could do nested calculations in the interpreters.

What I failed to both notice and remember is how Python (and apparently Ruby) handle integer arithmetic. Even in my examples, the answer comes out differently for Python and Perl, and I failed to notice. I also used a different example for Ruby, which I shouldn’t have done for consistency’s sake.

Here’s the problem — try this in either Python or Ruby, and you’ll see what happened:

>>> (5 + (2 * 80))/2
82

The answer, of course, should be 82.5. Why is it giving us the wrong answer? It’s not because Python can’t do the math, it’s because we’re telling it to do the math wrong.

Consider:

cranleigh:~ kschwen$ irb
>> (5.0 + (2 * 80))/2
=> 82.5


cranleigh:~ kschwen$ python
>>> (5.0 + (2 * 80))/2
82.5

So, what changed? Before, we were telling Python and Ruby to take an integer and divide it by another integer (in this case, 165 was being divided by 2). These languages take their cues from C, an older programming language that makes a distinction between integer math and decimal/floating-point math.

The code that I originally posted comes out roughly to this in C:

#include <stdio.h>
int main (void) {
   int x = 165/2;
   printf("%i", x);
   return 0;
}

This will output 82 just like Python did, because we’re telling it to do the math with integers. If we change it to this, however, things change:

#include <stdio.h>
int main (void) {
   float x = 165.0 / 2;
   printf("%f", x);
   return 0;
}

Here, we’re explicitly telling the compiler to use floating-point math, which will provide us with a more accurate answer.

This digs down to the foundations of the languages. In Python, you don’t have to explicitly give your variables a type as you do in C. If you’ll notice in the C code, when we were doing division of integers, we stored it in an “int,” which means a variable meant to hold an integer. In the code with the correct answer, we had to tell it we wanted floating-point math.

Python takes care of this for us, but sometimes it can burn you. It takes a look at your numbers — 165 and 2 — and figures that if you’re using two integers, you want an integer in return.

If you add a decimal point to even one of the numbers, you’ve turn it into a float, and Python will return a decimal number — even if there’s just a zero after the decimal:

>>> 80.0 + 5
85.0

Searching for answer, I stumbled across a blog with a good look at reasonable output of floating-point numbers. If you’re still confused and want another perspective, I suggest you take a look.

Not having to type your variables can often save you a headache, but you have to make sure you’re not introducing a new one in the process. Sorry if I confused you with the last post.

If you’re wondering why Perl gives the correct answer, it’s because Perl uses more advanced context cues to figure out what to do with your variables in this and other situations. In this case, it will automatically use floating-point math to return a proper answer. This takes slightly longer to execute, which is why C — a language meant to be lean and fast — doesn’t do it, and languages that follow C’s lead will have the same quirks.

Advertisements

Note: This post contains errors. I’m quite sorry, but see my correction post on the difference between floating-point and integer math for details.

Sometimes when you’re sitting at a computer you need to hammer out a quick calculation. Despite not being a huge math fan, I find this to be true more often than I’d like.

The problem is, I find calculators to be fairly limited and annoying machines to work with. If you make a mistake, you have to go back and do all of those steps over again. As your math gets more complicated, this tends to get more and more aggrevating.

What does this have to do with scripting? Well, if you have a Perl, Python or Ruby interpreter on hand, you can hammer out calculations quickly using just the command line.

Perl
With Perl, just fire up a command line and type your calculations (prefaced by the word “print”) in single-quotes after “perl -e”:


kschwen$ perl -e 'print sqrt((5 + (2 * 80))/2) . "\n"'
9.08295106229247

I think it’s pretty nifty because you can see your whole equation mapped out. The ‘. “\n”‘ at the end forces the answer to print on its own line, otherwise the number would run into the beginning of your command prompt.

Perl is widely known for its command-line scripting abilities. Savvy system administrators know that executing a “perl -e” can save a ton of time when there’s work to be done. I’ve actually found a great resource on perl one-liners if you’re interested.

Python
In Python, the easiest way is to just type “python” at the prompt, and do your calculations in the interactive interpreter:


kschwen$ python
>>> from math import sqrt
>>> sqrt((5 + (2 * 80))/2)
9.0553851381374173

I like doing it this way better, as you don’t have to add print statements or force a newline at the end. Also, since you’re in the interpreter, you can import new modules from the math library to do different things as you please.

Ruby
I know I haven’t really talked about Ruby, but that’s because I don’t use it. However, Ruby is interesting because you can accomplish this two ways:


kschwen$ ruby -e 'print 5 *5; print "\n";'
25

Or:


kschwen$ irb
>> 5 * 5
=> 25

From what I know, this is because Ruby has two interpreters. The interactive one, “irb”, is like Python’s interpreter — you can play with the language, executing different statements in your session. The actual “ruby” program interprets your script files, but can also be given the -e switch like perl’s interpreter, allowing you to execute one-liners on the command prompt.

If that’s inaccurate, please feel free to correct me, as I am not a ruby person.

In any case, doing math in a scripting language’s interepreter is an interesting and simple way to both play with a language and get some serious math done.


After last week’s mention of AppleScript I want to move onto something even easier and more fun to toy with on OSX. Under your applications folder, you should find a program called Automator.

It’s basically an abstraction layer over AppleScript. Just about anything you can do with Automator can be done by coding AppleScript, but it’s easier to drag some actions and set some options sometimes.

In Automator, you create workflows, which are basically just step-by-step graphical representations of programming functions. You can do things like rename files, create disk images, extract PDF text, get RSS feeds, find iCal items and more. You can even declare and use variables, run outside scripts and use system variables. Some of this stuff — even with proper knowledge of the libraries involved — would take a fair bit of programming skill to pull off.

A fairly simple example is one I use to create archives of files and folders.

My workflow

My workflow

If you create a workflow to match this (by taking action items from the left-hand Automator pane and dragging them to the right side), go to the File menu and save it as a plugin. Select Finder plugin, and give it a name like “Dated backup.”

Now, go to a folder and select some files you’d like to create a dated backup of. Right-click, go to “More” at the bottom of the menu, then go to the “Automator” item and select your plugin. Give it a second.

If all went well, a new file should appear in the folder you’re currently in, containing an archive of the file(s) you selected. It’s a useful workflow that showcases a cool part of Automator, namely how it allows you to manipulate files and folders using Finder and your script as a context-menu plugin.

There’s a bit of a lag, but I believe it’s the actual compression happening, and not an indication that Automator workflows are slower than most scripts. Even if there is some additional execution time, you’ve saved a good chunk of development time.

Tool around with Automator a bit though. I bet you’ll be impressed. I even ran across a guy who uses Automator to move tabs from Opera to Firefox. In his post, he shows off a cool trick where you can record your mouse and keyboard inputs for playback in your script, for when items aren’t nicely AppleScriptable.


I’ve done some Windows-only stuff on this blog already, but I own a Mac, so it’s about time I do something for my homies out there running OSX.

Breaking from the tools that would normally have you launching a Terminal, I’m going to direct your attention to AppleScript. As you might gather, AppleScript is Apple’s own scripting language for OSX, but its real power is that it’s a cinch for a developer to make their applications work with it. What does this mean for you? Oh, just that almost any OSX application can be automated. Even Photoshop (PDF warning!).

If you’re a heavy computer user (and what else would you be if you’re reading this blog?), you can probably think of a few tasks in your daily work flow that could benefit from automation. So as not to limit the utility of this though, let’s talk specifically about Folder Actions.

Folder Actions is a cool built-in tool that allows you to attach an AppleScript to a specific folder on your computer. Then, depending on how you write your script, you can have your script do things automatically when items are added or removed from that folder, or when that folder is opened, closed or moved.

The syntax is fairly simple, if not verbose:

on adding folder items to my_folder after receiving items
    repeat with file in items
         -- do stuff with "file"
    end repeat
end adding folder items to

To illustrate some of what’s possible here, I ran into a post showing how to script scanning files with Sophos, a popular anti-virus program, once they’re dropped into a specific folder. Another cool script comes from Mac OSX Hints, which is a great resource for these kinds of things, showing you how to automatically resize images.

My favorite, though, has to be a script that comes by default with Cyberduck — a good, free FTP client — that allows you to automatically upload files. Given that Cyberduck supports Amazon’s S3, you could write a script that can turn a folder on your computer into a portal to nigh-unlimited storage. If you work with people afraid of using FTP (I’ve met a few), you could use this script to allow them to simply drop files into a local folder for uploads. Combined with some other scripts, like file conversion, renaming and image resizing, you might start to see how using AppleScript and Folder Actions could really benefit your work flow.

OSX actually comes with a few scripts pre-installed, such as image conversion. I found a great visual tutorial to help you get started by playing with the built-in scripts.

Happy scripting!


When I first started with Python, I noticed that it had a built-in utility for parsing XML. After using regular expressions to rip through XML files as chunks of structured text (not a fun experience), I thought it would be an interesting idea to attempt it in Python using the built-in minidom parser. As a student of online journalism, I know a lot of data can be found in XML, including data from the National Weather Service. The ability to automate the fetching of data using XML and some scripting is very cool, and insanely useful if you have the right feed.

The test feed I used — and our test feed here — is one of the most-updated XML feeds I can think of: the Twitter public timeline. This XML feed updates about once per minute with the most recent posts to Twitter from all over the world. I decided to parse a Twitter feed and display peoples’ names and tweets, just to see how easy it would be.

As always, code first:

from urllib2 import urlopen
from xml.dom import minidom

feed = urlopen("http://twitter.com/statuses/public_timeline.xml")

doc = minidom.parse(feed)

#Get all doc elements matiching a given tag
names = doc.getElementsByTagName("screen_name") #Get all elements
updates = doc.getElementsByTagName("text") #Get all elements

tweets = zip(names, updates)
for tweeter_node, tweet_node in tweets:
    tweeter = tweeter_node.childNodes[0].nodeValue
    tweet = tweet_node.childNodes[0].nodeValue
    print "%s: %s" % (tweeter, tweet)

Astute readers will see that now we’re using the urllib2 library instead of urllib. The reason is that urllib2 has the urlopen() function, which will allow us to treat a URL like a local file handle instead of just caching it locally.

Our next step is to use the parse function of minidom. This function takes a handle to a file and returns a minidom object with the XML data structured an accessible through its methods. In XML, data is set between tags, such as <name>Ken Schwencke</name>. Using the minidom, we can return a set of objects contained within name tags by calling the getElementsByTagName() function of a minidom object returned from the parse() function earlier.

So we do this to the screen_name and text tags in the Twitter feed in order to grab all of the tweets and tweeters in the file.

We’re stuck with an odd problem now, though: there’s a one-to-one relationship between each element in the “names” and “updates” lists, so how do we iterate through them both at the same time? We need to combine them into one list and iterate through that.

Python’s built-in zip() function comes in handy here. It takes the corresponding elements of separate lists and “zips” them together into one. For example, if we had two lists of names that had a one-to-one relationship:

>>> first_name = ("Ken", "Adam")
>>> last_name = ("Schwencke", "Wynn")
>>> zip (first_name, last_name)
[('Ken', 'Schwencke'), ('Adam', 'Wynn')]

As you see, the zip() function combined the proper first and last names into matching tuples, all contained within one larger list.

Of course, the first thing we do after zipping the lists into one is split it back up in the for loop. Now that each element in the tweets list corresponds to a matching names/updates pair, we can iterate through the list.

Here’s where the magic happens, as far as getting data is involved:

tweeter = tweeter_node.childNodes[0].nodeValue
tweet = tweet_node.childNodes[0].nodeValue

Since the Twitter feed is fairly simple, the nodes we’re looking at don’t have children — that is, the only thing between matching screen_name tags is the screen name itself. There are no tags nested between them. Same with all text tags. If there were more, the parsing would get more complicated, but this is a “ridiculously straight-forward example.”

So we take the first child, which is the node itself, and access the nodeValue. This is the actual data between the XML tags. Now it’s just a matter of printing out the relevant data:

print "%s: %s" % (tweeter, tweet)

A “%s “inside of a string is Python shorthand for “a string variable will go here later.” The following % means that we’re passing a tuple with the follows for Python to plug into the previous string. In this case, I want the “tweeter” (the name from the screen_name XML tags) followed by the “tweet” itself (culled from the text XML tags).

That’s it! You’ve just parsed your first XML feed in Python.

Since I promised multiple examples, here’s another. Get the last published weather information from your nearest airport, or other weather-monitoring station:


from urllib2 import urlopen
from xml.dom import minidom

#Feed for the Gainesville airport.
feed = urlopen("http://www.weather.gov/xml/current_obs/KGNV.xml")

doc = minidom.parse(feed)

loc = doc.getElementsByTagName("location")
temp_f = doc.getElementsByTagName("temperature_string")
time = doc.getElementsByTagName("observation_time")

location = loc[0].childNodes[0].nodeValue
temperature = temp_f[0].childNodes[0].nodeValue
date = time[0].childNodes[0].nodeValue

print "It is %s at %s. %s" % (temperature, location, date)

Find your nearest location and plug it into the urlopen() function.


If you have Python on your computer, you have access to a powerful way to learn: the Python interpreter itself. It allows you to interactively test out code and see the result. So with that said, fire up your Python interpreter. If you’re on windows, either open your command prompt (start menu->run->cmd.exe) and type “python,” or navigate to your start menu, click on programs, then find Activestate Python and click the interpreter.

If you’re on Mac or Linux, open your terminal and simply type “python” — It should look like something like this:

Python 2.5.1 (r251:54863, Apr 15 2008, 22:57:26)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

Picking up from the last post (sorry this didn’t get up sooner):

If you don’t have one already, create a file called links.txt on your desktop and add in a few dummy links on separate lines. Three should suffice. Now, at the “>>>” prompt, paste in the following: f = file("C:/Documents and Settings/YOUR-USER-NAME/Desktop/links.txt", "r") and hit enter. Nothing should happen, and that’s fine. All you did was open a file.

Now, type f.readline() and hit enter. Woah! The first link in your file. An aptly named function, eh?

So if f.readline() reads a single line, it stands to reason that f.readlines() will collectively read in all of the lines in a file. It also does something extra useful, which is split them up into a list by line. In Python, you access elements of a list with the [] operator, so f.readlines()[0] (because as we all know, in programming you index starting with 0) is the same as f.readline().

However, if you call f.readline() followed by f.readlines() you might notice that the second time around, you’re missing the first link. This is because the file object remembers where you were in the file while using these functions, and reads only the lines you haven’t accessed yet.

So what’s with enumerate() and the two variables we had in that for loop before?

Go back to the interpreter and type:

>>> for x in enumerate(f.readlines()):
... print x

Make sure you hit at least two spaces before print, because Python is whitespace-sensitive, meaning things like a for loop, which will execute code within its scope, only know what to execute if it’s spaced properly. When a block of code is indented properly under other code, like a loop, we say it’s within the “scope” of the loop.

You should see a set of information displayed on your screen now for each link. We call this a tuple. It’s like a list, but you can’t change the contents. The first number is number returned from the enumerate() function, letting us know where we are in the loop. The other is the link itself.

When you supply one variable to hold the value of a function that returns a tuple, that variable will hold the tuple itself. However, you can split the tuple into two (or more) different variables by providing multiple variables to hold the values, just like we did.

So that’s the explanation I promised you on Friday. Sorry about that.


Most people, when they decide they want learn how to program or script, probably want to do something involving the Internet. At the very least, it’s a good way to show off the power of a scripting language like Python. You might be floored by how easy it is to download a file. When I came across this post on fetching a URL and downloading it to a file, a little light bulb went off above my head.

Let’s make something useful.

If you’ll reference the first post on automatically saving the clipboard in Windows, you might see where this is heading. Then again, maybe not. So let’s get to it.

We’re actually creating two scripts this time. The first is a modified version of the auto-save script, which will allow you to save a list of links to a file on your desktop (or wherever), called links.txt. The second, when run, will parse the links.txt file and download all of the files from the Internet. Once more, I’ll start with the full code for the first script:


import win32clipboard as w

w.OpenClipboard()
d=w.GetClipboardData(w.CF_TEXT)
w.CloseClipboard()

f = file(”C:/Documents and Settings/YOUR-USER-NAME/Desktop/links.txt”, “a”)
f.write(d)
f.close()

You should refer to the first post if you need help understanding this. A few changes: first, we no longer need to import datetime, since we don’t have to name the file with the current date and time. The second is the line where we open the file:


f = file(”C:/Documents and Settings/YOUR-USER-NAME/Desktop/links.txt”, “a”)

I’m using the file() function here because I came across some information that, apparently, open() is an alias for file(). It’s a matter of preference, but I’d rather use the real function. The other change here, besides the different filename, is the “a” at the end. Previously, we used “w” because we were writing to a new file; “a” stands for “append,” and will both create the file and allow us to continually write new information to the end of it if it already exists.

Here comes the second script:


from urllib import urlretrieve

f = file("C:/Documents and Settings/YOUR-USER-NAME/Desktop/links.txt", "r")
for n, link in enumerate(f.readlines()):
    urlretrieve(link, "C:/Documents and Settings/YOUR-USER-NAME/Desktop/" + str(n) + ".html")
f.close()

That’s it. Five lines of code. Let’s break it down.

from urllib import urlretrieve

As before, this just gives us access to the urlretrieve() function of the urllib library. Urlretrieve downloads a URL to a temporary location unless you pass it a file to save to, but we’ll get to that in a minute.

f = file("C:/Documents and Settings/YOUR-USER-NAME/Desktop/", "r")
for n, link in enumerate(f.readlines()):

The first line we should be familiar with by now; this time we pass file() (or open()) an “r” because we wish to read from a file.

The next line looks a little tricky, though. It’s a Python for loop, which allows you to iterate over multiple objects or a list of some sort. The variables “n” and “link” store where we are in the list and the item in the list, respectively. Where is this list coming from? Well, that explanation will come in the next post (I had to split it up because it was veering off too much into Python syntax and data types).

Suffice it to say for now that enumerate() takes a list of some sort and returns two variables: a counter that increases by one each time (starting with 0) and whatever the value was that was originally in that position in the list. This way, we can keep track of where we are while looping. I only use it for naming the file here, but it can be useful in other ways.

Let’s move on though, shall we?
    urlretrieve(link, "C:/Documents and Settings/YOUR-USER-NAME/Desktop/" + str(n) + ".html")

Note the spacing before the function. It’s because Python is whitespace-sensitive. Putting spaces there denotes that urlretrieve() is within the scope of the for loop, i.e., the loop will execute that code as many times as it needs to.

In any case, this line just downloads the value in the link variable (one of the lines from the file), and saves it to your desktop sequentially. The str(n) part there converts “n,” a variable holding the current position in the list, to a string, which allows us to append it as the file name.

After that, we simply f.close() the file to be good programmers. We don’t need to indent the file closing because we only want that executed once, when all of the looping is done with.

That’s it. Save the file somewhere, call it something like “autodownload.py,” and double-click it whenever you’ve stored up some things in links.txt you want to cache locally. Feel free to create a directory somewhere to store the files in, and tell the script to download things to there. No need to clutter the desktop.

Now, you might catch something here: what happens if you get a new set of links and download them? Won’t the enumerating start over again, causing the other cached files to be overwritten? Good catch. If you want to plan for this sort of thing, you’ll need to create an md5 hash of the URLs and store with that file name.

An md5 hash will simply create a unique string of characters for another given string. It’s not much more work, just add import md5 to the top of the file, and replace the str(n) code with md5.new(link).hexdigest(). Now your filenames should never collide, unless you’re repeatedly downloading the same URL, in which case you probably want them to overwrite.

That leaves us with:

from urllib import urlretrieve
import md5

f = file("C:/Documents and Settings/YOUR-USER-NAME/Desktop/links.txt", "r")
for link in f.readlines():
    urlretrieve(link, "C:/Documents and Settings/YOUR-USER-NAME/Desktop/" + md5.new(link).hexdigest() + ".html")
f.close()

Note that I got rid of the “n” variable and the enumerate() function, because they were only there for naming the files.

Toy with the code a bit, see what you can get it to do. Let me know how it works out for you! Check back in a day or two for the explanation of the enumerate() function.