Issue 36: Missing Tag
Status:  Fixed
Owner:
Closed:  Dec 2012
Project Member Reported by login...@gmail.com, Dec 12, 2011
As you know i can see users search on scraper. I will use this topic to report missing tag:

Extended
Dec 12, 2011
Project Member #1 addicted...@gmail.com
Extended has been added
Keep 'em coming :-)

Can I get access to searches on parser ? I'd be very interested in knowing how many searches are done on a daily basis.

BTW, I gave you access to Google Analytics for this project. As you may notice, traffic has increased significatively in the past few days :-)
https://www.google.com/analytics/web/
Labels: -Type-Defect -Priority-Low Type-Enhancement Priority-Medium
Dec 13, 2011
Project Member #2 login...@gmail.com
Ehm...
Currently, (excluding me and you) there are no people that are using our parser.
Unfortunately only last CSI version works well, older version had a bug that induces users reinstalling transmission without torrentexpander.

I could automatically send to you log by email.
Dec 13, 2011
Project Member #3 addicted...@gmail.com
The least I can say is that this is disappointing.
Let's keep up the good work and hope people will start using it and enjoy it.

I'll be interested in the logs when the parser starts being used by many people. It will really help improve the script.
Dec 14, 2011
Project Member #4 login...@gmail.com
Users are not a problem.  Issue 37 
Dec 14, 2011
Project Member #5 login...@gmail.com
Renaming not-all good:
Mai.Dire.Grande.Fratello.12.E07.iTALiAN.PDTV.XviD-EXiT.avi
Mai Dire Grande Fratello 12 E07_.avi
Dec 14, 2011
Project Member #6 login...@gmail.com
Another case, found on scraper log:
Cavendish+Documentary+From+Sky+Sports_
Dec 15, 2011
Project Member #7 login...@gmail.com
Missing tags: 'Theatrical Cut'
Labels: -Priority-Medium Priority-High
Dec 16, 2011
Project Member #8 addicted...@gmail.com
Last build solves all of this
I'm so glad I'm using variables and regexp conversion to manage patterns :-)
Dec 16, 2011
Project Member #9 login...@gmail.com
Sorry if I confused,
When the user search Cavendish Documentary From Sky Sports_, log looks like Cavendish+Documentary+From+Sky+Sports_

That filename does not contain plus
Dec 16, 2011
Project Member #10 login...@gmail.com
Thanks for update. Switched to medium priority. I think the max priority now is  Issue 13 , useful to implement in CSI  Issue 30 .
Labels: -Priority-High Priority-Medium
Dec 17, 2011
Project Member #11 addicted...@gmail.com
What do you think is the best approach ?
Considering "documentary" as a pattern that should be removed may be a bad idea. Many movies and documentaries include the word documentary in their title.
Dec 17, 2011
Project Member #12 login...@gmail.com
i think documentary is not a pattern. 
Dec 18, 2011
Project Member #13 login...@gmail.com
Now there are some active users. Missing tags, founded on logs:

Dvdscreener
Spanish
X264
Divx
Dvdriptorrents (we can divide it?)
Hdtvrip
Mvo (???)

Wrong renaming:
2004+The+Swan+Princess

Also, please tell me which line i have to add in crontab to send you logs. 
This is the main command: 
cat /var/log/apache2/access.log | grep imdb
Dec 18, 2011
Project Member #14 addicted...@gmail.com
Hi
I never tried it and I can't test it right now, but once ssmtp is configured on your server, this line should do the trick
0 0 * * * echo -ne "$(cat /var/log/apache2/access.log | grep imdb)" | mail -v -s "Latest Torrentexpander Queries" mymailadress@gmail.com

Of course, mymailadress@gmail.com has to be replaced by my real mail address :-D
I'm really glad new people started using torrentexpander.

Dec 20, 2011
Project Member #15 addicted...@gmail.com
Here is a better command.
We don't need the full logs to improve tags recognition.
0 0 * * * echo -ne "$(cat /var/log/apache2/access.log.1 | grep imdb | sed 's;^.*/imdbWebService\.php?m\=\(.*\)\&o\=xml.*$;\1;g' | sed 's;%28;(;g' | sed 's;%29;);g' | sed 's;\+; ;g')" | mail -v -s "Latest Torrentexpander Queries" mymailadress@gmail.com

Thanks
Dec 21, 2011
Project Member #16 login...@gmail.com
unknown -v option
access.log.1 is the old log, switch to access.log ?
You should receive mail from now.
Dec 27, 2011
Project Member #17 login...@gmail.com
Tryed this line but is not working, manually yes, automatic no.

0 0 * * * root echo -ne "$(cat /var/log/apache2/access.log | grep imdb | sed 's;^.*/imdbWebService\.php?m\=\(.*\)\&o\=xml.*$;\1;g' | sed 's;%28;(;g' | sed 's;%29;);g' | sed 's;\+; ;g')" | mail -s "Latest Torrentexpander Queries" addicteffefddffgdghggrghtgdghsfgsfg@gmail.com

/bin/sh: -c: line 0: unexpected EOF while looking for matching `''
/bin/sh: -c: line 1: syntax error: unexpected end of file
Dec 27, 2011
Project Member #18 addicted...@gmail.com
Here's another line that should work in cron :
First check if bin path is correct :
which echo
which cat
which grep
which sed
which mail

59 23 * * * /bin/echo -ne "$(/bin/cat /var/log/apache2/access.log | /bin/grep imdb | /bin/sed 's;^.*/imdbWebService\.php?m\=\(.*\)\&o\=xml.*$;\1;g' | /bin/sed 's;%28;(;g' | /bin/sed 's;%29;);g' | /bin/sed 's;\+; ;g')" | /usr/bin/mail -s "Latest Torrentexpander Queries" addicteffefddffgdghggrghtgdghsfgsfg@gmail.com

Let me know how that works for you
Dec 29, 2011
Project Member #19 login...@gmail.com
Manually works! I hope automatic also!

Some New Missing tags:
Dubbed
Collection
Screener
Remastered
Season
Nlsubs
Hd1080p

Dec 29, 2011
Project Member #20 addicted...@gmail.com
Added these patterns to the current build
We'll see in a few hours if logs are now automatically sent
Jan 3, 2012
Project Member #21 addicted...@gmail.com
I never received any croned log
Have you made sure mail is set up for root, as it seems that this is a root cron ?
Thanks
Jan 4, 2012
Project Member #22 login...@gmail.com
I have made some changes, i hope now it works.
Jan 4, 2012
Project Member #23 addicted...@gmail.com
Just received a report
Is it from your cron ?

I'll edit and send you a new line in order to remove duplicates and garbage

Thank you
Jan 4, 2012
Project Member #24 login...@gmail.com
no, it's a manual test.
Jan 4, 2012
Project Member #25 addicted...@gmail.com
In about 600 lines there aren't a lot of missing tags
I'll add a few missing patterns like "xxx" and "[. _-].*[. _-]subs"
I'll check the rest this week-end

I'm glad people are using torrentexpander now :-)

To get rid of all the garbage, the cron line should be
59 23 * * * /bin/echo -ne "$(/bin/cat /var/log/apache2/access.log | /bin/grep imdb | /bin/sed 's;^.*/imdbWebService\.php?m\=\(.*\)\&o\=xml.*$;\1;g' | /bin/sed 's;%28;(;g' | /bin/sed 's;%29;);g' | /bin/sed 's;\+; ;g' | /usr/bin/sed "s;%20; ;g" | /usr/bin/grep -v "[Ii]mdb[Ww]eb[Ss]ervice" | /usr/bin/sort | /usr/bin/uniq)" | /usr/bin/mail -s "Latest Torrentexpander Queries" addicteffefddffgdghggrghtgdghsfgsfg@gmail.com

Have a nice evening, thanks for all your work
Jan 4, 2012
Project Member #26 login...@gmail.com
Sent new test mail
Jan 4, 2012
Project Member #27 addicted...@gmail.com
Looks like we're not there yet
I forwarded you the e-mails
Jan 4, 2012
Project Member #28 login...@gmail.com
What's wrong?
Jan 4, 2012
Project Member #29 addicted...@gmail.com
First mail sent at 23:59 contained a few garbage lines
Second mail sent at 00:17 was empty

I'll try to replicate your setup and try in on my server this week-end
Jan 5, 2012
Project Member #30 login...@gmail.com
Ok, but..what is a 'garbage' line?
Jan 5, 2012
Project Member #31 login...@gmail.com
Missing tags:
Plsubbed

I see in logs a lot of tv series, but...this is not supposed to be here.
Isn't Imdb future avaiable only for films right now? Anyway a missing tag it would be season/episode tagging 

S??
S??EP???
EP??
Jan 5, 2012
Project Member #32 addicted...@gmail.com
I'll see the full log this weekend
By garbage lines, I meant duplicates and these kind of lines :
183.97.156.227 - - [27/Dec/2011:04:50:38  0100] "GET /imdbWebService.php HTTP/1.1" 200 393 "http://chk.co-cc-domain.net/open_url_list.php?p=1" "Mozilla/5.0 (Windows NT 5.1; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"

The line in comment 25 should get us rid of all that

TV series are there because those do not respect the SXXEXX pattern and because they include one of the movies patterns. Unless we find a common pattern to all of these, I guess there's nothing we can do about it
Jan 5, 2012
Project Member #33 addicted...@gmail.com
Latest log only contains one entry
I'll look into it this weekend
Jan 5, 2012
Project Member #34 login...@gmail.com
garbage lines will ends shortly, when expires my account on co.cc.
Jan 12, 2012
#35 luk...@gmail.com
some more tags:
"PLSUB" "PLSUBBED" "brrrip" "TSXVID"   "XviD"  "DivXNL"   "divx"   "Subtit"     "Subs "   "Subs."    "Subs-"    "Subs_"   "NL Subs"   "KLAXXON"  "aXXo"   "BRRip"   "BDRip"    "Bluray"   "HDTV"   "HR HDTV"   "R5"  "Telesync"   "TELECINE"   "Webrip"    "vomit"   "Dita"   "DVB"  "Omifast"    "@KIDZ"   "KIDZCORNER"   "1080"  "720"  "480"   "x264"   "H264"   "AC3"   "AC-3"   "FXG"   ".TS"   "TS."    " TS"    "-TS"   "TS-"   "NTSC"   " WS"    "WS."    ".WS"   "NL "    "NLT"  "CN "   "TC "    "ISO."   "Swesub"  "VHS"  "READNFO"   "ViCiOsO"   "WorkPrint"   "ExtraTorrent"   "2Lions"   " VOSTFR"   "FxM"   "DUQA"   "newartriot"   "nHaNc3"   "DDC"   "keltz"   "REAL PROPER"   "PROPER"    "DEWSTRR"   "CVCD"   "VCD"   "LIMITED"   "Electri4ka"   "Electrichka"   "NORARS"   "aceford"   "jigaxx"   "ShortKut"   "danger2u"   "www."   "www "   "1 of"  "1of"   "2 of"   "2of"   "3 of"   "3of"   "cd1"   "cd2"   "cd3"  "1CD"   "2CD"  "1 CD"    "PDVD-RIP"    "PDVD"    "PDV"    "Pre DVD"   "Pre-DVD"    "DVD"    "PPVRIP"   "www"   "1CDRip"   "2CDRip"   "UNCUT "    "Director Cut"    "Directors"   "Director's"     " TPB"   "PSP"   "PDTV"   "iPod"   "Zune"    ".avi"   "mp4"   "mpg"   "3gp"  "wmv"    "CAMELOT"    "CAM"  "mkv"   "m4"   "xRipp"   "Goblin10"  "By .. DragonLord721"   "EXTENDED"   "Los Sustitutos"    "BR-Scr"  "BR-Screener"   "SCREENER"   "SCR "     "SCR."   "UNRATED"   "REPACK"   "HQ"  "RETAIL"   "1337x"   "Noir"   "NEW SOURCE"   "DiTa"    "UVall"   "FQM"   "CHGRP"   "LMAO"   "NoTV"   "DVSKY"   "DSR"   "2HD"   "2Wire"   "Ekolb"   "SHAMNBOYZ"  "!!!"  "~"  "ExtraScene"   "CHUPPI"   "MAXSPEED"  "ShareReactor"  "ShareZONE"  "ShareGo"   "aAF"    "xRG"    "STV"   "-MAX"   "iNTERNAL"    "RESYNC"   "SYNC-"   "SYNCFIX"   "TRUEFRENCH"    "FRENCH"   "ENGLISH"   "SPANISH"   "iTA "   "iTALIA"  "Hindi"   "GERMAN"   " ENG"   ".ENG" "187HD"             "HR HDTV"   "FQM"   "LMAO"   "XOXO"   "eztv"   "PDV"   "PDTV"   "TSXVID"   "XviD"   "DSR"   "DivXNL"   "Divx"   "2HD"   "2WIRE"   "NL Subs"   "KLAXXON"   "aXXo"   "NoTV"   "BRRip"   "BDRip"   "Bluray"   "HDTV"   "R5"   "BYU"   "DVB"   "Omifast"   "@KIDZ"   "KIDZCORNER"   "AC3"   "AC-3"   "FXG"   "NTSC"   " WS"   "WS."   ".WS"   "NL "   "NLT"   "CN "   "TC "   "ISO."   "Swesub"   "VHS"   "READNFO"   "ViCiOsO"   "WorkPrint"   "OneDDL.com"   "fwint.com"   "  Demonoid com  "   "ExtraTorrent com"   "ExtraTorrent"   "VOST "   " VOSTFR"   "FxM"   "DDC"   "keltz"   "REAL PROPER"   "PROPER"   "CVCD"   "VCD"   "LIMITED"   "www."   "www "   "PDVD"   "PDVD-RIP"   "PPVRIP"   "www"   "1CDRip"   "2CDRip"   "Pre DVD"   "Pre-DVD"   "DVD"   "UNCUT "   " TPB"   "PSP"   "iPod"   "Zune"   "mp4"   "mpg"   "3gp"   "wmv"   "mkv"   "m4"   "xRipp"   "YesTV"   "CRIMSON"   "EXTENDED"   "BR-Scr"   "BR-Screener"   "SCREENER"   "SCR "   "SCR."   "UNRATED"   "REPACK"   "HQ"   "RETAIL"   "Noir"   "NEW SOURCE"   "DiTa"   "SHAMNBOYZ"   "!!!"   "ExtraScene"   "MAXSPEED"   "ShareReactor"   "ShareZONE"   "ShareGo"   "aAF"   "xRG"   "STV"   "-MAX"   "RESYNC"   "SYNC-"   "SYNCFIX"   "TRUEFRENCH"   "iTA "   "_BBC"   "_ITV"   "_Channel 4"   "_Film4"  "cw4f"   "w4f"
Jan 12, 2012
Project Member #36 addicted...@gmail.com
Wow !
This is a huge list of patterns :-)
I'll add them this week-end

Thanks !
Jan 13, 2012
Project Member #37 addicted...@gmail.com
I integrated a bunch of them in the SVN I'll commit really soon.
I also allowed user defined patterns to be added to the settings.ini file.

Here are the patterns that are not yet integrated
DEWSTRR
CVCD
VCD
Electri4ka
Electrichka
NORARS
aceford
jigaxx
ShortKut
danger2u
PDVD-RIP
PDVD
PDV
Pre DVD
Pre-DVD
DVD
Director Cut
Directors
Director's
TPB
PSP
iPod
Zune
CAMELOT
xRipp
Goblin10
DragonLord721
EXTENDED
Los Sustitutos
REPACK
HQ
1337x
NEW SOURCE
UVall
CHGRP
LMAO
NoTV
DVSKY
DSR
2HD
2Wire
Ekolb
SHAMNBOYZ
ExtraScene
CHUPPI
MAXSPEED
ShareReactor
ShareZONE
ShareGo
xRG
STV
SYNCFIX
TRUEFRENCH
eztv
2WIRE
AC3
AC-3
FXG
NTSC
VHS
ViCiOsO
OneDDL.com
fwint.com
Demonoid.com
VOST
FxM
DDC
ShareReactor
ShareZONE
ShareGo
_ITV
_Channel 4
_Film4
Feb 2, 2012
Project Member #38 login...@gmail.com
after server change, i'm using lighttpd now, we have to change mail command
Mar 29, 2012
Project Member #39 login...@gmail.com
We have to add this TAG, i think:
3D, SBS
Apr 1, 2012
Project Member #40 login...@gmail.com
dvdrip, Dvd, Eng, dvd5, dvd9, torrents, Www, X264, dvdripspanish, dvdscr, Torrent, fansub
Apr 1, 2012
Project Member #41 login...@gmail.com
half-sbs, full-sbs
Dec 2, 2012
Project Member #42 addicted...@gmail.com
Sorry for my lack of updates during the last few (many) months.
New job that takes most of my time, new apartment that requires a lot of work, not enough time remaining to take care of torrentexpander.
Sorry for that

Those tags have been added

Thanks for your input

   Addictedtoscreens
Status: Fixed