Issue 20: Awesome imdb integration!
Status:  Fixed
Owner: ----
Closed:  Dec 2011
Project Member Reported by login...@gmail.com, Nov 13, 2011
It is possible to grab some film information from imdb!
How? Let me explain...

IMDB WEB SCRAPER (Open Source)
http://lab.abhinayrathore.com/imdb/imdbWebService.php?m=Titanic&o=xml

EXAMPLE CODE
# Grab IMDB information table
# $ FILM is 'FILMNAME (YEAR)' 
wget -o imdb.xml http://lab.abhinayrathore.com/imdb/imdbWebService.php?m=$FILM&o=xml

# Grab Film Title (Awesome for PCH!!)
# Many languages avayable!
# Popcorn users will LOVE this! nmt jukebox requires english title only!!
# With this scraper you could search using ex. french/italian/spanish film name, and you can get english film name, required by Jukebox!

TITLE=`xpath -q -e '//ALSO_KNOWN_AS' imdb.php | grep Italy | sed 's/<[^>]*>//g' | sed 's/=.*//g' | sed 's/&amp;#x27;/'"'"'/g'`;

# Grab Film year
YEAR=`xpath -q -e '//YEAR' imdb.php | sed 's/<[^>]*>//g'`;

# Grab Film POSTER (Awesome!!)
# Many Poster Avayable!
POSTER=`xpath -q -e '//POSTER_SMALL' imdb.php | sed 's/<[^>]*>//g'`;
wget $POSTER

# Grab Rating (Awesome!!)
RATING=`xpath -q -e '//RATING' imdb.php | sed 's/<[^>]*>//g'`;

# Grab imdb title (tt0120338)
# Useful for Popcorn Hour!!!
IMDB=`xpath -q -e '//TITLE_ID' imdb.php | sed 's/<[^>]*>//g'`;
Nov 14, 2011
Project Member #1 login...@gmail.com
THE FIRST IDEA: build .nfo files for Popcorn Hour NMT jukebox.

Why? 
NMT jukebox requires english film title only to get movie data from internet.
If you live outside the USA, then you won't able to use Jukebox until you manually rename films with their english title.

How to solve the problem?
The best way to solve the problem is to place an .nfo file near film the contains correct imdb url. In this way, no rename is required. More info at: " http://www.networkedmediatank.com/showthread.php?tid=46095 "

Example:
News Movie (2008).mkv // Italian title of 'The Onion Movie'
News Movie (2008).nfo // It contains correct imdb url: http://www.imdb.com/title/tt0392878/

How to do this?
Simple!

$ FILM is News Movie (2008)

wget -o imdb.xml http://lab.abhinayrathore.com/imdb/imdbWebService.php?m=$FILM&o=xml&callback=%3F&submit=Call

IMDB_URL=`xpath -q -e '//IMDB_URL' imdb.php | sed 's/<[^>]*>//g'`;

echo "$IMDB_URL" >> $FILM.nfo
Nov 15, 2011
Project Member #2 login...@gmail.com
Quick & (Absolutely) Dirty implementation
Needs nfo destination folder at line 610.

torrentexpander_imdb.sh
78.7 KB   View   Download
Nov 15, 2011
Project Member #3 addicted...@gmail.com
Thanks for your input and really sorry for not getting back to you earlier.
I'll definitely look into it next week-end.
I'll need to look into wget and make sure there is a timeout that can be configured.
I'll also need to make sure wget is installed and input its path.
Also one thing I'll have to look into is that http://labaia.hellospace.net/imdbWebService.php website. It's amazing what it is able to spit out.
I'll also need to make sure the rest of the script won't choke on those nfo files.

Thanks for your help

Nov 16, 2011
Project Member #4 login...@gmail.com
Scraper file n.1
imdb.php
9.9 KB   View   Download
Nov 16, 2011
Project Member #5 login...@gmail.com
Scraper file n.2
imdbWebService.php
2.3 KB   View   Download
Nov 16, 2011
Project Member #6 login...@gmail.com
New version of modded scripts.
Added Poster download, some minor update.
I don't know if PCH support xpath. I doubt.
torrentexpander_imdb_2.sh
79.2 KB   View   Download
Nov 16, 2011
Project Member #7 addicted...@gmail.com
Thanks
I browsed through all the script and here's what I plan on doing in terms of IMDB integration.
Let me know what you think of it.
I won't start coding until I'm sure I haven't forgotten anything... also I only have time to work on it during weekends...

- add the imdb options in the script parameter
- add this new parameter to the settings.ini file
- Detect wget path
- Save movie title / series title in a new variable (X) and file name after it has been renamed in another variable (Y) :
	-> If multiple video file with a movie pattern / series pattern in the surrounding folder, retain the name of the folder in its imdb perfect match format (remove season.* from the end of the name if it's a series pack)
	-> If only one video file with a movie pattern, retain the name of the file without its extension in its perfect match format
- Make sure http://labaia.hellospace.net/imdbWebService.php is up and running
- Add the lines you kindly supplied in a new subroutine right before the "Convert DTS track..." routine. If lookup fails, the script has to be able to recover.
	Also :
	-> Title stored in variable X will be fetched for NFO and JPG
	-> Add nfo and jpg extensions to the movies_extensions_rev variable
	-> If single file movie, and and NFO/JPG files downloaded (count files) name those NFO and JPG files as variable_Y.extension and :
		- put all these files (movie included) in a new folder named variable_Y
		- rewrite $log_files with those files and the new folder
	-> If multi files movie / series pack, fetch variable X for NFO / JPG and store / name them as the files included in the surrounding folder (there will then be movie_part_1.nfo, movie_part_1.jpg, movie_part_2.nfo, movie_part_2.jpg + movie_part_1.avi and movie_part_2.avi in this folder... maybe much more for a series pack) and list all that in the $log_files file.

And of course credit you in the script to thank you for your help :-)

Thanks again for your interest in torrentexpander
Nov 17, 2011
Project Member #8 login...@gmail.com
1) Yes, add imdb options and settings.
I suggest "produce_imdb_nfo", "download film poster", "poster_format". The "poster_format" variables can be 'normal, large, small, full'.
2) Of course.
3) Ok, but .. what about xpath, are you able to replace its function? ;-)
4) Example is required. ( ?? )
5) Of course.
6) I think the most important thing is "let users choose what to do".

If an user want to rename files using 'type_3' rules, but it want to produce .nfo and .jpg of the movie, then he can do it, even 'type_1' rules is necessary to get correct imdb information. 

The imdb implementation must be separate from renaming script, but the film name must match the .nfo & .jpg filename. Users may decide to not rename files, but he may want to use imdb.
Nov 18, 2011
Project Member #10 login...@gmail.com
This is my concept map.
Please have a look. I'm not sure if all is correct and meets torrentexpander futures. 
Feel free to edit the concept map, we could use for documentation.
Thanks to program best automatic rename tool of the world.
concept map.docx
82.1 KB   Download
Nov 18, 2011
Project Member #11 addicted...@gmail.com
Hi
Thank you for all the time you spent improving torrentexpander.
I finally took time to give your imdb integration routine a try.
wget is not always installed by default (for example on Mac OS X), so I tried curl instead.
Depending on which one is installed, I'll automatically switch to the right one.
I kinda improved some lines by storing the xml in a variable and dropped xpath dependencies.

I'll spend time on imdb integration and your concept map this week-end.
Thanks for your help.

PS: I'm no programer and I started writing my first lines of code when I started torrentexpander not so long ago, so it's nice to know you like it.

Take a look at the rewriting :

	# IMDB integration
        nfo_file=`echo "$title_clean_ter_other_pat".nfo`;
        poster=`echo "$title_clean_ter_other_pat".jpg`;
		xml_cont="$(curl -i "http://labaia.hellospace.net/imdbWebService.php?m=$title_clean_ter_other_pat&o=xml")"
		wait
		imdb_url=`echo "$(echo $xml_cont | egrep -o "<IMDB_URL>.*</IMDB_URL>" | sed -e 's;\(<IMDB_URL>\)\(.*\)\(</IMDB_URL>\);\2;')"`;
		poster_url=`echo "$(echo $xml_cont | egrep -o "<POSTER>.*</POSTER>" | sed -e 's;\(<POSTER>\)\(.*\)\(</POSTER>\);\2;')"`;
		if [ "$imdb_url" != "" ]; then
			step_number=$(( $step_number + 1 ))
			echo "Step $step_number : Building .nfo";
			echo "$imdb_url" > "$destination_folder/$nfo_file";
			fi
		if [ "$poster_url" != "" ]; then
			step_number=$(( $step_number + 1 ))
			echo "Step $step_number : Downloading Poster";
			curl -o "$destination_folder/$poster" "$poster_url";
			# wget -q -O "$destination_folder/$poster" $poster_url;
			wait
		fi
Nov 18, 2011
Project Member #12 login...@gmail.com
Excellent!

I'm studing a way to get fanart images using this:
http://api.themoviedb.org/2.1/methods/Movie.getImages

Preparing for this future, i suggest to grab TITLE_ID.
title_id=`echo "$(echo $xml_cont | egrep -o "<TITLE_ID>.*</TITLE_ID>" | sed -e 's;\(<TITLE_ID>\)\(.*\)\(</TITLE_ID>\);\2;')"`;


Nov 18, 2011
Project Member #13 login...@gmail.com
Very simple!

fanart=`echo "$title_clean_ter_other_pat".fanart.jpg`;
wget/curl http://api.themoviedb.org/2.1/Movie.getImages/en/xml/57983e31fb435df4df77afb854740ea9/$title_id

then grab the url of random backdrop imgage in size $fanart_size // user choose depending tv

wget -q -O "$destination_folder/$fanart" $fanart_url;
Nov 19, 2011
Project Member #14 addicted...@gmail.com
Hi Loginbug
I just created a Torrentexpander 101 wiki page to help you understand the basic structure of torrentexpander
https://code.google.com/p/torrentexpander/wiki/Torrentexpander_in_depth?ts=1321742965&updated=Torrentexpander_in_depth
Your idea of maintaining a concept map is great, but due to the length of the script, we'll need to use a modeling software.
Torrentexpander is only 800 lines long but it is already fairly complex. I only started this project a few months ago and I am already losing track of what line does what and why it does it.
Right now, I'm reviewing the whole script in order to refresh my memory and be more efficient while adding the imdb functionality.
Nov 19, 2011
Project Member #15 login...@gmail.com
Thanks, i will read.
Anyway, i found a bug in your imdb script, on command curl. That's the correct way:

# curl function dislike spacing; replace spaces with +
title_clean_ter_other_pat_nospace=`echo $title_clean_ter_other_pat | sed 's/\ /\+/g'`;
xml_cont="$(curl -i "http://labaia.hellospace.net/imdbWebService.php?m=$title_clean_ter_other_pat_nospace&o=xml")"

Nov 20, 2011
Project Member #17 login...@gmail.com
The last, working version of modded script.
NFO + POSTER + FANART Avaiable
I prefer to use grep commmand insted of egrep ad xml files insted of varibles

torrentexpander_imdb_tmdb.sh
81.2 KB   View   Download
Nov 20, 2011
Project Member #18 addicted...@gmail.com
Check out SVR release r81
IMDB is now integrated
I still have issue with curl not setting mime type for images
Also, I commented out fanart lines because I haven't had enough time to make it work

You need to enable this at the beginning of the script or in your settings.ini file:
imdb_poster="yes"
imdb_poster_format="normal"
imdb_nfo="yes"
imdb_fanart="yes"
imdb_fanart_format="w1280"
Nov 21, 2011
Project Member #20 login...@gmail.com
Good, but imdb plugin should work even if 'clean_filename'=no
Nov 21, 2011
Project Member #21 login...@gmail.com
I have seen that script are not able to rename files ( ...CD1.avi & ...CD2.avi ) inside a folder (renamed correctly).
Nov 22, 2011
Project Member #22 addicted...@gmail.com
Regarding comment 20 : SVN release r83 doesn't require clean_filename to be turned on for IMDB routine to work.
Regarding comment 21 : long ago, I decided not to rename files if several files are found in a torrent.

There are too many patterns (CD1/CD2, moviea/movieb, movie1/movie2, moviepart1/moviepart2, and so on)
Also, what happens if the torrent contains TV Episodes, Subtitles (especially idx/sub)...
Renaming files from a multi files torrent would be really likely to fuck up, trust me on that ;-)

Once I'm done adding fanarts and making sure no nfo/jpg is generated for non movie files (set, idx, sub subtitles), I'll ask you to test it thoroughly and confirm me it works fine - for now everything seems OK.

Thanks again
Nov 23, 2011
Project Member #23 login...@gmail.com
Thanks.
Yes, i trust you.
Nov 23, 2011
Project Member #24 login...@gmail.com
If destination directory already exist, program stop itself: 'destination folder is not empty'
I think the program should continue, putting the files inside it (only if filename is NOT the same).

Example
Suppose that you have download two version of the same Film

1) First version 
It's a folder named /Avatar.2009.Xvid-MYDAD/
--> Avatar.2009.Xvid.CD1-MYDAD.avi
--> Avatar.2009.Xvid.CD2-MYDAD.avi

2) Second version 
It's a folder named /Avatar.2009.Xvid-MYMUM/
--> Avatar.2009.Xvid.CD1-MYMUM.avi
--> Avatar.2009.Xvid.CD2-MYMUM.avi

After run torrentexpender for both files with 'type_1' schema, i should get:
Folder: Avatar (2009)
--> Avatar.2009.Xvid.CD1-MYDAD.avi
--> Avatar.2009.Xvid.CD1-MYDAD.nfo
--> Avatar.2009.Xvid.CD1-MYDAD.jpg
--> Avatar.2009.Xvid.CD2-MYDAD.avi
--> Avatar.2009.Xvid.CD2-MYDAD.nfo
--> Avatar.2009.Xvid.CD2-MYDAD.jpg
--> Avatar.2009.Xvid.CD1-MYMUM.avi
--> Avatar.2009.Xvid.CD1-MYMUM.nfo
--> Avatar.2009.Xvid.CD1-MYMUM.jpg
--> Avatar.2009.Xvid.CD2-MYMUM.avi
--> Avatar.2009.Xvid.CD2-MYMUM.nfo
--> Avatar.2009.Xvid.CD2-MYMUM.jpg

I think this is a good job. It's ordered.

If destination file already exist, damn! Is it possible to rename folder only?
if /Avatar (2009)/ exists then new folder could be /Avatar (2009) [1]/


Nov 23, 2011
Project Member #25 login...@gmail.com
It is necessary to add some code to avoid creation of empty file.

if [ "$imdb_poster" == "yes" && "$poster_url != "" ]; then "$wget_curl" -q "$poster_url" -O "$temp_folder_without_slash/temp_poster"; wait; fi

I suggesto to you to use xml files insted of xml variables for debuggin reason.
It will very nice if torrentexpander had --debug option that (for ex. debug mode mantain imdb.xml and themoviedb.xml files)

Nov 25, 2011
Project Member #26 login...@gmail.com
Torrentepander better IMDB TMDB plugin
+ Do no create empty file
+ Debug support ( I really need )
- Only wget at moment

imdb_tmdb_debug.sh
6.7 KB   View   Download
Nov 26, 2011
Project Member #27 addicted...@gmail.com
Thank you for this
I made some minor changes to your code and included it to the last SVN
I'm sticking with variables instead of xml files, but added some more information to the debug log.
Also, I improved the rename routine so that determining the IMDB title works faster.

I couldn't get fanart to work. xml looks like that :
1 3 true false en Adaptation. Adaptation. The Orchid Thief movie 2757 tt0268126 http://www.themoviedb.org/movie/2757 Charlie Kaufman (Cage) writes the way he lives, with great difficulty. His twin brother Donald (also Cage) lives the way he writes, with foolish abandon. Susan (Streep) writes about life, but can't live it. John's (Cooper) life is a book, waiting to be adapted. One story. Four lives. A million ways it can end. 19 8.0 R 2002-06-12 114 1228 2011-11-26 14:57:49 UTC

This is what it is supposed to look like :
http://api.themoviedb.org/2.1/methods/Movie.imdbLookup

I'm 
Nov 27, 2011
Project Member #29 addicted...@gmail.com
The line that's already in the script should work and doesn't rely on new commands like tr
The problem is that none of the xml downloaded from tmbd contains any backdrop... while the the movies in question obviously have backdrops.

XML looks like that :
1 3 true false en Adaptation. Adaptation. The Orchid Thief movie 2757 tt0268126 http://www.themoviedb.org/movie/2757 Charlie Kaufman (Cage) writes the way he lives, with great difficulty. His twin brother Donald (also Cage) lives the way he writes, with foolish abandon. Susan (Streep) writes about life, but can't live it. John's (Cooper) life is a book, waiting to be adapted. One story. Four lives. A million ways it can end. 19 8.0 R 2002-06-12 114 1228 2011-11-26 14:57:49 UTC

On the website, there are about 10 backdrops :
http://www.themoviedb.org/movie/2757-adaptation

Nov 27, 2011
Project Member #31 login...@gmail.com
This is my xml file downloaded with sample script.
log.txt
10.4 KB   Download
Nov 27, 2011
Project Member #33 login...@gmail.com
This sample script works well for me!

sample.sh
433 bytes   View   Download
Nov 27, 2011
Project Member #34 addicted...@gmail.com
Thank you
Fanart now works in latest SVN build
TMDB servers must have fucked up yesterday evening
Nov 27, 2011
Project Member #35 addicted...@gmail.com
Switched it to enhancement
Labels: -Type-Defect Type-Enhancement
Dec 4, 2011
Project Member #36 login...@gmail.com
(No comment was entered for this change.)
Status: Fixed